Python: Remove Duplicates From A Multi-dimensional Array
In Python numpy.unique can remove all duplicates from a 1D array, very efficiently. 1) How about to remove duplicate rows or columns in a 2D array? 2) How about for nD arrays?
Solution 1:
If possible I would use pandas.
In [1]: from pandas import *
In [2]: import numpy as np
In [3]: a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
In [4]: DataFrame(a).drop_duplicates().values
Out[4]:
array([[1, 1],
[2, 3],
[5, 4]], dtype=int64)
Solution 2:
The following is another approach which performs much better than for
loop. 2s for 10k+100 duplicates.
deftuples(A):
try: returntuple(tuples(a) for a in A)
except TypeError: return A
b = set(tuples(a))
The idea inspired by Waleed Khan's first part. So no need for any additional package that is may have further applications. It is also super Pythonic, I guess.
Solution 3:
The numpy_indexed package solves this problem for the n-dimensional case. (disclaimer: I am its author). Infact, solving this problem was the motivation for starting this package; but it has grown to include a lot of related functionality.
import numpy_indexed as npi
a = np.random.randint(0, 2, (3, 3, 3))
print(npi.unique(a))
print(npi.unique(a, axis=1))
print(npi.unique(a, axis=2))
Post a Comment for "Python: Remove Duplicates From A Multi-dimensional Array"