Ranking Of Numpy Array With Possible Duplicates
I have a numpy array of floats/ints and want to map its elements into their ranks. If an array doesn't have duplicates the problem can be solved by the following code In [49]: a1 O
Solution 1:
You can do reasonably well using unique
and bincount
>>>u, v = np.unique(a2, return_inverse=True)>>>(np.cumsum(np.bincount(v)) - 1)[v]
array([0, 3, 4, 5, 6, 3, 7, 9, 9, 3])
Or, for the minimum rank:
>>> (np.cumsum(np.concatenate(([0], np.bincount(v)))))[v]array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])
There's a minor speedup by giving bincount
the number of bins to provide:
(np.cumsum(np.bincount(v, minlength=u.size)) - 1)[v]
Solution 2:
After upgrading to a latest version of scipy
as suggested @WarrenWeckesser in the comments, scipy.stats.rankdata
seems to be faster than both scipy.stats.mstats.rankdata
and np.searchsorted
being the fastet way to do it on larger arrays.
In [1]: import numpy as np
In [2]: from scipy.stats import rankdata as rd
...: from scipy.stats.mstats import rankdata as rd2
In [3]: array = np.arange(0.1, 1000000.1)
In [4]: %timeit np.searchsorted(np.sort(array), array)
1 loops, best of 3: 385 ms per loop
In [5]: %timeit rd(array)
10 loops, best of 3: 109 ms per loop
In [6]: %timeit rd2(array)
1 loops, best of 3: 205 ms per loop
Solution 3:
Here is a function that can return the output you desire (in the first case)
sorted = sort(a1)
ranked = []
for item in a1:
return array(ranked)
Basically you sort it and then you search for the index the item is at. Assuming duplicates the first instance index should be returned. I tested it with your a2 example and doing something like
a3 = argsortdup(a2)
array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])
"Test with a2":
array([ 0.1, 1.1, 2.1, 3.1, 4.1, 1.1, 6.1, 7.1, 7.1, 1.1])
>>>defargsortdup(a1):...sorted = sort(a1)... ranked = []...for item in a1:... ranked.append(sorted.searchsorted(item))...return array(ranked)...>>>a3 = argsortdup(a2)>>>a2
array([ 0.1, 1.1, 2.1, 3.1, 4.1, 1.1, 6.1, 7.1, 7.1, 1.1])
array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])
Post a Comment for "Ranking Of Numpy Array With Possible Duplicates"