Skip to content Skip to sidebar Skip to footer

How Do You Efficiently Sum The Occurences Of A Value In One Array At Positions In Another Array

Im looking for an efficient 'for loop' avoiding solution that solves an array related problem I'm having. I want to use a huge 1Darray (A -> size = 250.000) of values between 0

Solution 1:

You are looking for a sparse tensor:

import torch

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]
idx = torch.LongTensor([A, B])
torch.sparse.FloatTensor(idx, torch.ones(idx.shape[1]), torch.Size([5,3])).to_dense()

Output:

tensor([[0., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.],
        [0., 0., 2.],
        [1., 0., 0.]])

You can also do the same with scipy sparse matrix:

import numpy as np
from scipy.sparse import coo_matrix

coo_matrix((np.ones(len(A)), (np.array(A), np.array(B))), shape=(5,3)).toarray()

output:

array([[0., 1., 0.],
       [0., 0., 0.],
       [0., 0., 1.],
       [0., 0., 2.],
       [1., 0., 0.]])

Sometimes it is better to leave the matrix in its sparse representation, rather than forcing it to be "dense" again.

Solution 2:

Use numpy.add.at:

import numpy as np

A = [0, 3, 2, 4, 3]
B = [1, 2, 2, 0, 2]

arr = np.zeros((5, 3))
np.add.at(arr, (A, B), 1)

print(arr)

Output

[[0. 1. 0.][0. 0. 0.][0. 0. 1.][0. 0. 2.][1. 0. 0.]]

Solution 3:

Given that the numbers are in a small range, bincount would be a good choice for bin-based summing -

def accumulate_coords(A,B):
    nrows = A.max()+1
    ncols = B.max()+1
    return np.bincount(A*ncols+B,minlength=nrows*ncols).reshape(-1,ncols)

Sample run -

In[55]: AOut[55]: array([0, 3, 2, 4, 3])

In[56]: BOut[56]: array([1, 2, 2, 0, 2])

In[58]: accumulate_coords(A,B)
Out[58]: 
array([[0, 1, 0],
       [0, 0, 0],
       [0, 0, 1],
       [0, 0, 2],
       [1, 0, 0]])

Post a Comment for "How Do You Efficiently Sum The Occurences Of A Value In One Array At Positions In Another Array"