Xarray Equivalent Of Pandas `qcut()` Function
I want to calculate the Decile Index - see the ex1-Calculate Decile Index (DI) with Python.ipynb. The pandas implementation is simple enough but I need help with applying the bin l
Solution 1:
I'm not sure pandas.qcut
is giving you exactly what you expect; e.g. see the bins it returns in your example:
>>> test = result.to_dataframe()
>>> binned, bins = pd.qcut(test['rank_norm'], 5, labels=[1, 2, 3, 4, 5], retbins=True)
>>> bins
array([ 0. , 12.5, 37.5, 62.5, 87.5, 100. ])
If I understand correctly, you are looking to assign an integer value at each point based on the bin the point falls into. That is:
0.0 <= x < 20.0
: 120.0 <= x < 40.0
: 240.0 <= x < 60.0
: 360.0 <= x < 80.0
: 480.0 <= x
: 5
For this task I would probably recommend using numpy.digitize
applied via xarray.apply_ufunc
:
>>> bins = [0., 20., 40., 60., 80., np.inf]
>>> result = xr.apply_ufunc(np.digitize, result, kwargs={'bins': bins})
Solution 2:
It looks like if you use a scalar
to define your bins
then it will only generate 4 ranges. You can check this by looking at the length
and the name of the keys
of the groups
of the resulting GroupBy object:
mybins = [20., 40., 60., 80., np.inf]
decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=mybins)
len(decile_index_gpby.groups)
=> 4
decile_index_gpby.groups.keys()
=> [Interval(80.0, inf, closed='right'),
Interval(20.0, 40.0, closed='right'),
Interval(60.0, 80.0, closed='right'),
Interval(40.0, 60.0, closed='right')]
To prevent the loss of 1/5th of the values, you would have to change your definition of mybins
to something like:
mybins = [np.NINF, 20., 40., 60., np.inf]
which is not what you want.
So use bins=5
instead:
decile_index_gpby = rank_norm.groupby_bins('rank_norm', bins=5)
len(decile_index_gpby.groups)
=>5
decile_index_gpby.groups.keys()
=> [Interval(80.0, 100.0, closed='right'),
Interval(20.0, 40.0, closed='right'),
Interval(60.0, 80.0, closed='right'),
Interval(40.0, 60.0, closed='right'),
Interval(-0.1, 20.0, closed='right')]
Post a Comment for "Xarray Equivalent Of Pandas `qcut()` Function"