Summarizing Rows In A Pandas Dataframe
I have the following rows: ColumnID MenuID QuestionID ResponseCount RowID SourceColumnID SourceRowID SourceVariationID 22 -2 -2 319276487
Solution 1:
You could do something like the following:
print df
ColumnID MenuID QuestionID ResponseCount RowID SourceVariationID
0 -2 -231927648728304940035430494003651 -2 -231927648731304940035430494003652 -2 -231927648737304940035430494003653 -2 -231927648728304940035330494003654 -2 -231927648745304940035330494003655 -2 -231927648746304940035330494003656 -2 -231927648726304940035330494003657 -2 -231927648733304940035330494003658 -2 -231927648739304940035330494003659 -2 -23192764872630494003533049400365defsquash(group):
x = group.iloc[1,:].drop(['RowID','SourceVariationID'])
x['ResponseCount'] = group['ResponseCount'].sum()
return x
print df.groupby(['RowID','SourceVariationID']).apply(squash)
ColumnID MenuID QuestionID ResponseCount
RowID SourceVariationID
30494003533049400365 -2 -231927648724330494003543049400365 -2 -231927648796
Solution 2:
Assuming that your other columns are integers:
columns = df.columns.tolist()
columns.remove('ResponseCount')
columns.remove('RowID')
tempDf = df.groupby(['RowID'])[['ResponseCount']].sum()
tempDf = tempDf.join(df.groupby(['RowID'])[columns].min())
tempDf['RowID'] = tempDf.index
Quick solution, not a great one! Hope this helps.
Post a Comment for "Summarizing Rows In A Pandas Dataframe"