Skip to content Skip to sidebar Skip to footer

Pandas: Custom Group-by Function

I am looking for a custom group-by function that is going to group the rows in a way such that: If there is any number and 0 it will add the number. If there are two numbers (they

Solution 1:

Maybe not what you would have thought, but this should work

start_df.groupby('id').max()

Use reset_index if you want to bring 'id' back into the columns.

Solution 2:

I believe the solution you are looking that fits ideal.

I have added the below another approach, Specifying as_index=False in groupby keeps the original index using groupby.GroupBy.nth

>>> start_df.groupby('id',  as_index=False).nth(1)
   id  foo  bar
114.0  NaN
327.04.053  NaN  1.0749.06.0

OR

>>> start_df.groupby(['id'], sort=False).max().reset_index()
   id  foo  bar
014.0  NaN
127.04.023  NaN  1.0349.06.0

Solution 3:

here is another approach not with groupby but I can't tell if it is more efficient. The idea is to have the same number of rows for each id to be able to reshape the data and use np.nanmax over an axis. To do so, you can generate a dataframe with the missing values as nan.

#create the count of each id
s = start_df.id.value_counts()
nb_max = s.max()
#ceate the dataframe with nan
df_nan = pd.DataFrame({col: np.nan if col != 'id'else [ids for ids, val inzip(s.index,nb_max-s.values) 
                                             for _ inrange(val)] 
                       for col in start_df.columns })
#get the result
result_df = pd.DataFrame( np.nanmax( pd.concat([start_df, df_nan])[start_df.columns]
                                       .sort_values('id').values
                                       .reshape((-1,start_df.shape[1],nb_max)), 
                                     axis=1), 
                          columns = start_df.columns)

Note: you get a warning saying some slice are only nan, but it works, there is probably a way to silent this warning.

Post a Comment for "Pandas: Custom Group-by Function"