Skip to content Skip to sidebar Skip to footer

Xarray Equivalent Of Pandas `qcut()` Function

I want to calculate the Decile Index - see the ex1-Calculate Decile Index (DI) with Python.ipynb. The pandas implementation is simple enough but I need help with applying the bin l

Solution 1:

I'm not sure pandas.qcut is giving you exactly what you expect; e.g. see the bins it returns in your example:

>>> test = result.to_dataframe()
>>> binned, bins = pd.qcut(test['rank_norm'], 5, labels=[1, 2, 3, 4, 5], retbins=True)

>>> bins
array([  0. ,  12.5,  37.5,  62.5,  87.5, 100. ])

If I understand correctly, you are looking to assign an integer value at each point based on the bin the point falls into. That is:

  • 0.0 <= x < 20.0: 1
  • 20.0 <= x < 40.0: 2
  • 40.0 <= x < 60.0: 3
  • 60.0 <= x < 80.0: 4
  • 80.0 <= x: 5

For this task I would probably recommend using numpy.digitize applied via xarray.apply_ufunc:

>>> bins = [0., 20., 40., 60., 80., np.inf]
>>> result = xr.apply_ufunc(np.digitize, result, kwargs={'bins': bins})

Solution 2:

It looks like if you use a scalar to define your bins then it will only generate 4 ranges. You can check this by looking at the length and the name of the keys of the groups of the resulting GroupBy object:

mybins = [20., 40., 60., 80., np.inf]

decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=mybins)

len(decile_index_gpby.groups)
=> 4

decile_index_gpby.groups.keys()
=> [Interval(80.0, inf, closed='right'),
    Interval(20.0, 40.0, closed='right'),
    Interval(60.0, 80.0, closed='right'),
    Interval(40.0, 60.0, closed='right')]

To prevent the loss of 1/5th of the values, you would have to change your definition of mybins to something like:

mybins = [np.NINF, 20., 40., 60., np.inf]

which is not what you want.

So use bins=5 instead:

decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=5)

len(decile_index_gpby.groups)
=>5

decile_index_gpby.groups.keys()
=> [Interval(80.0, 100.0, closed='right'),
    Interval(20.0, 40.0, closed='right'),
    Interval(60.0, 80.0, closed='right'),
    Interval(40.0, 60.0, closed='right'),
    Interval(-0.1, 20.0, closed='right')]

Post a Comment for "Xarray Equivalent Of Pandas `qcut()` Function"