Skip to content Skip to sidebar Skip to footer

Python: How To Do Average Among Different Pandas Data Frame Columns?

I have the following dataset: import pandas as pd df = pd.DataFrame({'ID1': [0, 1, 0, 2, 2, 4], 'ID2': [1, 0, 3, 4, 4, 2], 'Title': ['a', 'b', 'c', 'd', 'e', 'f'], 'Weight': [3,

Solution 1:

Suppose you define

pairs = df.apply(lambda r: (min(r.ID1, r.ID2), max(r.ID1, r.ID2)), axis=1)

Then these are just the normalized pairs of you DataFrame (lower first, higher second). Now you can just groupby these, and find the weighted average:

>>>df.groupby(pairs).apply(lambda g: len(g) / float(g.Weight.sum()))
(0, 1)    0.250000
(0, 3)    1.000000
(2, 4)    0.428571
dtype: float64

To get your exact required DataFrame, some fiddling with the columns is needed, but it's basically the code above:

pairs = df.apply(lambda r: (min(r.ID1, r.ID2), max(r.ID1, r.ID2)), axis=1)
weighted = pd.merge(
    df.groupby(pairs).apply(lambda g: len(g) / float(g.Weight.sum())).reset_index(),
    df.groupby(pairs).size().reset_index(),
    left_index=True,
    right_index=True)
weighted['ID1'] = weighted['index_x'].apply(lambda p: p[0])
weighted['ID2'] = weighted['index_x'].apply(lambda p: p[1])
weighted['Total'] = weighted['0_x']
weighted['Weighted Ave'] = weighted['0_y']
weighted = weighted[['ID1', 'ID2', 'Total', 'Weighted Ave']]
>>> weighted
    ID1     ID2     Total   Weighted Ave
0010.25000021031.00000012240.4285713

Solution 2:

You can first sort columns ID1 and ID2 by numpy.ndarray.sort and then groupby with apply custom function f:

print df
   ID1  ID2 Title  Weight
001     a       3110     b       5203     c       1324     d       1424     e       5542     f       1

id1id2 = df[['ID1','ID2']].values
id1id2.sort(axis=1)
print id1id2
[[0 1]
 [0 1]
 [0 3]
 [2 4]
 [2 4]
 [2 4]]

df[['ID1','ID2']] = id1id2
print df
   ID1  ID2 Title  Weight
001     a       3101     b       5203     c       1324     d       1424     e       5524     f       1
deff(x):
    #print len(x)#print x['Weight'].sum()return pd.Series({'Total':len(x), 'Weighted Av.': len(x) / float(x['Weight'].sum()) })

print df.groupby(['ID1','ID2']).apply(f).reset_index()
   ID1  ID2  Total  Weighted Av.
0012.00.2500001031.01.0000002243.00.428571

Post a Comment for "Python: How To Do Average Among Different Pandas Data Frame Columns?"