Skip to content Skip to sidebar Skip to footer

Pandas Simple Correlation Of Two Grouped Dataframe Columns

Is there a good way to get the simple correlation of two grouped DataFrame columns? It seems like no matter what the pandas .corr() functions want to return a correlation matrix.

Solution 1:

I would expect something like test.groupby('Name')['X'].corr('Y') to work but it doesn't and when you pass the Series itself (test['Y']) it becomes slower. At this point it seems apply is the best option:

test.groupby('Name').apply(lambda df: df['X'].corr(df['Y']))
Out: 
Name
A   -0.484955
B    0.520701
C    0.120879
dtype: float64

This iterates over each group and applies Series.corr in each grouped DataFrame. The differences arise from not setting a random seed.

Post a Comment for "Pandas Simple Correlation Of Two Grouped Dataframe Columns"