Sort Values In DataFrame Using Categorical Key Without Groupby Split Apply Combine
So... I have a Dataframe that looks like this, but much larger: DATE ITEM STORE STOCK 0 2018-06-06 A L001 4 1 2018-06-06 A L002 0 2 2018-0
Solution 1:
I've tried to improve your groupby code, so this should be a lot faster.
v = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff()
df['DELTA'] = np.where(np.isnan(v), 0, v)
Some pointers/ideas here:
- Don't iterate over groups
- Don't pass series as the groupers if the series belong to the same DataFrame. Pass string labels instead.
diff
can be vectorized- The last line is tantamount to a
fillna
, butfillna
is slower thannp.where
- Specifying
sort=False
will prevent the output from being sorted by grouper keys, improving performance further
This can also be re-written as
df['DELTA'] = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff().fillna(0)
Post a Comment for "Sort Values In DataFrame Using Categorical Key Without Groupby Split Apply Combine"