Skip to content Skip to sidebar Skip to footer

Summarizing Rows In A Pandas Dataframe

I have the following rows: ColumnID MenuID QuestionID ResponseCount RowID SourceColumnID SourceRowID SourceVariationID 22 -2 -2 319276487

Solution 1:

You could do something like the following:

print df
   ColumnID  MenuID  QuestionID  ResponseCount       RowID  SourceVariationID
0        -2      -231927648728304940035430494003651        -2      -231927648731304940035430494003652        -2      -231927648737304940035430494003653        -2      -231927648728304940035330494003654        -2      -231927648745304940035330494003655        -2      -231927648746304940035330494003656        -2      -231927648726304940035330494003657        -2      -231927648733304940035330494003658        -2      -231927648739304940035330494003659        -2      -23192764872630494003533049400365defsquash(group):
    x = group.iloc[1,:].drop(['RowID','SourceVariationID'])
    x['ResponseCount'] = group['ResponseCount'].sum()
    return x

print df.groupby(['RowID','SourceVariationID']).apply(squash)

                             ColumnID  MenuID  QuestionID  ResponseCount
RowID      SourceVariationID                                             
30494003533049400365               -2      -231927648724330494003543049400365               -2      -231927648796

Solution 2:

Assuming that your other columns are integers:

columns = df.columns.tolist()
columns.remove('ResponseCount')
columns.remove('RowID')
tempDf = df.groupby(['RowID'])[['ResponseCount']].sum()
tempDf = tempDf.join(df.groupby(['RowID'])[columns].min())
tempDf['RowID'] = tempDf.index

Quick solution, not a great one! Hope this helps.

Post a Comment for "Summarizing Rows In A Pandas Dataframe"