Pandas: Custom Group-by Function
I am looking for a custom group-by function that is going to group the rows in a way such that: If there is any number and 0 it will add the number. If there are two numbers (they
Solution 1:
Maybe not what you would have thought, but this should work
start_df.groupby('id').max()
Use reset_index
if you want to bring 'id' back into the columns.
Solution 2:
I believe the solution you are looking that fits ideal.
I have added the below another approach, Specifying as_index=False
in groupby keeps the original index using groupby.GroupBy.nth
>>> start_df.groupby('id', as_index=False).nth(1)
id foo bar
114.0 NaN
327.04.053 NaN 1.0749.06.0
OR
>>> start_df.groupby(['id'], sort=False).max().reset_index()
id foo bar
014.0 NaN
127.04.023 NaN 1.0349.06.0
Solution 3:
here is another approach not with groupby
but I can't tell if it is more efficient. The idea is to have the same number of rows for each id to be able to reshape
the data and use np.nanmax
over an axis. To do so, you can generate a dataframe with the missing values as nan.
#create the count of each id
s = start_df.id.value_counts()
nb_max = s.max()
#ceate the dataframe with nan
df_nan = pd.DataFrame({col: np.nan if col != 'id'else [ids for ids, val inzip(s.index,nb_max-s.values)
for _ inrange(val)]
for col in start_df.columns })
#get the result
result_df = pd.DataFrame( np.nanmax( pd.concat([start_df, df_nan])[start_df.columns]
.sort_values('id').values
.reshape((-1,start_df.shape[1],nb_max)),
axis=1),
columns = start_df.columns)
Note: you get a warning saying some slice are only nan
, but it works, there is probably a way to silent this warning.
Post a Comment for "Pandas: Custom Group-by Function"