Sort Values In DataFrame Using Categorical Key Without Groupby Split Apply Combine

March 03, 2023 Post a Comment

So... I have a Dataframe that looks like this, but much larger: DATE ITEM STORE STOCK 0 2018-06-06 A L001 4 1 2018-06-06 A L002 0 2 2018-0

Solution 1:

I've tried to improve your groupby code, so this should be a lot faster.

v = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff()
df['DELTA'] = np.where(np.isnan(v), 0, v)

Some pointers/ideas here:

Don't iterate over groups
Don't pass series as the groupers if the series belong to the same DataFrame. Pass string labels instead.
diff can be vectorized
The last line is tantamount to a fillna, but fillna is slower than np.where
Specifying sort=False will prevent the output from being sorted by grouper keys, improving performance further

This can also be re-written as

Baca Juga

df['DELTA'] = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff().fillna(0)