Skip to content Skip to sidebar Skip to footer

Sort Values In DataFrame Using Categorical Key Without Groupby Split Apply Combine

So... I have a Dataframe that looks like this, but much larger: DATE ITEM STORE STOCK 0 2018-06-06 A L001 4 1 2018-06-06 A L002 0 2 2018-0

Solution 1:

I've tried to improve your groupby code, so this should be a lot faster.

v = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff()
df['DELTA'] = np.where(np.isnan(v), 0, v)

Some pointers/ideas here:

  1. Don't iterate over groups
  2. Don't pass series as the groupers if the series belong to the same DataFrame. Pass string labels instead.
  3. diff can be vectorized
  4. The last line is tantamount to a fillna, but fillna is slower than np.where
  5. Specifying sort=False will prevent the output from being sorted by grouper keys, improving performance further

This can also be re-written as

df['DELTA'] = df.groupby(['ITEM', 'STORE'], sort=False).STOCK.diff().fillna(0)

Post a Comment for "Sort Values In DataFrame Using Categorical Key Without Groupby Split Apply Combine"