Splitting Groupby() In Pandas Into Smaller Groups And Combining Them
Solution 1:
You can create a helper column via GroupBy
+ cumcount
to count the occurrence of each city.
Then use dict
+ tuple
with another GroupBy
to create a dictionary of dataframes, each one containing exactly one occurence of each city.
# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()
# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))
Result:
print(d)
{0:citytemperaturewindspeedeventindexday2017-01-01 newyork326Rain02017-01-01 mumbai905Sunny02017-01-01 paris4520Sunny0,
1:citytemperaturewindspeedeventindexday2017-01-02 newyork367Sunny12017-01-02 mumbai8512Fog12017-01-02 paris5013Cloudy1,
2:citytemperaturewindspeedeventindexday2017-01-03 newyork2812Snow22017-01-03 mumbai8715Fog22017-01-03 paris548Cloudy2,
3:citytemperaturewindspeedeventindexday2017-01-04 newyork337Sunny32017-01-04 mumbai925Rain32017-01-04 paris4210Cloudy3}
You can then extract individual "groups" via d[0]
, d[1]
, d[2]
, d[3]
. In this particular case, you may wish to group by dates instead, i.e.
d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}
Solution 2:
This is my approach to this. First sort your dataframe by day
and city
:
df = df.sort_values(by=['day', 'city'])
Next find an even split of 4 groups for your dataframe - if the split is not even then the last group will get the remaining:
n = int(len(df)/4)
groups_n = np.cumsum([0, n, n, n, len(df)-(3*n)])
print(groups_n)
OUT >> array([ 0, 6, 12, 18, 25], dtype=int32)
groups_n
is the start
and end
for each group. So Group 1
I will take df.iloc[0:6]
and Group 4
I will take df.iloc[18:25]
.
So your final dictionary, d
, of the 4 group split of your dataframe will be:
d = {}
for i inrange(4):
d[i+1] = df.iloc[groups_n[i]:groups_n[i+1]]
Example Outputs:Group 1 (
d[1]
)
citytemperaturewindspeedeventday2017-01-01 mumbai905Sunny2017-01-01 newyork326Rain2017-01-01 paris4520Sunny2017-01-02 mumbai8512Fog2017-01-02 newyork367Sunny2017-01-02 paris5013Cloudy
Group 4: (
d[4]
)
citytemperaturewindspeedeventday2017-01-07 mumbai859Sunny2017-01-07 newyork2712Rain2017-01-07 paris4014Rain2017-01-08 mumbai898Rain2017-01-08 newyork237Rain2017-01-08 paris4215Cloudy2017-01-09 paris538Sunny
Post a Comment for "Splitting Groupby() In Pandas Into Smaller Groups And Combining Them"