Skip to content Skip to sidebar Skip to footer

Splitting Groupby() In Pandas Into Smaller Groups And Combining Them

city temperature windspeed event day 2017-01-01 new york 32 6 Rain

Solution 1:

You can create a helper column via GroupBy + cumcount to count the occurrence of each city.

Then use dict + tuple with another GroupBy to create a dictionary of dataframes, each one containing exactly one occurence of each city.

# add index column giving count of city occurrence
df['index'] = df.groupby('city').cumcount()

# create dictionary of dataframes
d = dict(tuple(df.groupby('index')))

Result:

print(d)

{0:citytemperaturewindspeedeventindexday2017-01-01  newyork326Rain02017-01-01   mumbai905Sunny02017-01-01    paris4520Sunny0,
 1:citytemperaturewindspeedeventindexday2017-01-02  newyork367Sunny12017-01-02   mumbai8512Fog12017-01-02    paris5013Cloudy1,
 2:citytemperaturewindspeedeventindexday2017-01-03  newyork2812Snow22017-01-03   mumbai8715Fog22017-01-03    paris548Cloudy2,
 3:citytemperaturewindspeedeventindexday2017-01-04  newyork337Sunny32017-01-04   mumbai925Rain32017-01-04    paris4210Cloudy3}

You can then extract individual "groups" via d[0], d[1], d[2], d[3]. In this particular case, you may wish to group by dates instead, i.e.

d = {df_.index[0]: df_ for _, df_ in df.groupby('index')}

Solution 2:

This is my approach to this. First sort your dataframe by day and city:

df = df.sort_values(by=['day', 'city'])

Next find an even split of 4 groups for your dataframe - if the split is not even then the last group will get the remaining:

n = int(len(df)/4)
groups_n = np.cumsum([0, n, n, n, len(df)-(3*n)])
print(groups_n)
OUT >> array([ 0,  6, 12, 18, 25], dtype=int32)

groups_n is the start and end for each group. So Group 1 I will take df.iloc[0:6] and Group 4 I will take df.iloc[18:25].

So your final dictionary, d, of the 4 group split of your dataframe will be:

d = {}
for i inrange(4):
    d[i+1] = df.iloc[groups_n[i]:groups_n[i+1]]

Example Outputs:Group 1 (d[1])

citytemperaturewindspeedeventday2017-01-01  mumbai905Sunny2017-01-01  newyork326Rain2017-01-01  paris4520Sunny2017-01-02  mumbai8512Fog2017-01-02  newyork367Sunny2017-01-02  paris5013Cloudy

Group 4: (d[4])

citytemperaturewindspeedeventday2017-01-07  mumbai859Sunny2017-01-07  newyork2712Rain2017-01-07  paris4014Rain2017-01-08  mumbai898Rain2017-01-08  newyork237Rain2017-01-08  paris4215Cloudy2017-01-09  paris538Sunny

Post a Comment for "Splitting Groupby() In Pandas Into Smaller Groups And Combining Them"