How To Efficiently Split Scipy Sparse And Numpy Arrays Into Smaller N Unequal Chunks?
Solution 1:
Make a sparse matrix:
In [62]: M=(sparse.rand(10,3,.3,'csr')*10).astype(int)
In [63]: M
Out[63]:
<10x3 sparse matrix of type '<class 'numpy.int32'>'with9 stored elements in Compressed Sparse Row format>
In [64]: M.A
Out[64]:
array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 5],
[0, 0, 2],
[0, 0, 6],
[0, 4, 4],
[7, 1, 0],
[0, 0, 2]])
The dense equivalent is easily split. array_split
handles unequal chunks, but you can also spell out the split as illustrated in the other answer.
In [65]: np.array_split(M.A, 3)
Out[65]:
[array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]), array([[0, 0, 5],
[0, 0, 2],
[0, 0, 6]]), array([[0, 4, 4],
[7, 1, 0],
[0, 0, 2]])]
In general numpy
functions cannot work directly on sparse matrices. They aren't a subclass. Unless the function delegates the action to the array's own method, the function probably won't work. Often the function starts with np.asarray(M)
, which is not the same as M.toarray()
(try it yourself).
But split
is nothing more than slicing along the desired axis. I can produce the same 4,2,3 split with:
In [143]: alist = [M[0:4,:], M[4:7,:], M[7:10]]
In [144]: alist
Out[144]:
[<4x3 sparse matrix of type'<class 'numpy.int32'>'
with 1 stored elements in Compressed Sparse Row format>,
<3x3 sparse matrix of type'<class 'numpy.int32'>'
with 3 stored elements in Compressed Sparse Row format>,
<3x3 sparse matrix of type'<class 'numpy.int32'>'
with 5 stored elements in Compressed Sparse Row format>]
In [145]: [m.A for m in alist]
Out[145]:
[array([[0, 7, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=int32), array([[0, 0, 5],
[0, 0, 2],
[0, 0, 6]], dtype=int32), array([[0, 4, 4],
[7, 1, 0],
[0, 0, 2]], dtype=int32)]
The rest is administrative details.
I should add that sparse slices are never views. They are new sparse matrices with their own data
attribute.
With the split indexes in a list, we can construct the split list with a simple iteration:
In [146]: idx = [0,4,7,10]
In [149]: alist = []
In [150]: for i in range(len(idx)-1):
...: alist.append(M[idx[i]:idx[i+1]])
I haven't worked out the details of how to construct idx
, though an obvious starting point in the 10
, the M.shape[0]
.
For even splits (that fit)
In [160]: [M[i:i+5,:] for i inrange(0,M.shape[0],5)]
Out[160]:
[<5x3 sparse matrix of type'<class 'numpy.int32'>'with2 stored elements in Compressed Sparse Row format>,
<5x3 sparse matrix of type'<class 'numpy.int32'>'with7 stored elements in Compressed Sparse Row format>]
Solution 2:
First, convert scipy.sparse.csr_matrix
to numpy ndarray, then pass a list to numpy.split(ary, indices_or_sections, axis=0)
.
If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in ary[:2] ary[2:3] ary[3:]
https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html
X1,X2,X3=np.split(X.toarray(), [1000,2000])
Post a Comment for "How To Efficiently Split Scipy Sparse And Numpy Arrays Into Smaller N Unequal Chunks?"