How Do You Efficiently Sum The Occurences Of A Value In One Array At Positions In Another Array
Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0
Solution 1:
You are looking for a sparse tensor:
import torch
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()
Output:
tensor([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
You can also do the same with scipy sparse matrix:
import numpy as np
from scipy.sparse import coo_matrix
coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()
output:
array([[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.],
[0., 0., 2.],
[1., 0., 0.]])
Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.
Solution 2:
Use numpy.add.at:
import numpy as np
A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)
print(arr)
Output
[[0. 1. 0.][0. 0. 0.][0. 0. 1.][0. 0. 2.][1. 0. 0.]]
Solution 3:
Given that the numbers are in a small range, bincount
would be a good choice for bin-based summing -
def accumulate_coords(A,B):
nrows = A.max()+1
ncols = B.max()+1
return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)
Sample run -
In[55]: AOut[55]: array([0, 3, 2, 4, 3])
In[56]: BOut[56]: array([1, 2, 2, 0, 2])
In[58]: accumulate_coords(A,B)
Out[58]:
array([[0, 1, 0],
[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[1, 0, 0]])
Post a Comment for "How Do You Efficiently Sum The Occurences Of A Value In One Array At Positions In Another Array"