Skip to content Skip to sidebar Skip to footer

Computing Autocorrelation Of Vectors With Numpy

I'm struggling to come up with a non-obfuscating, efficient way of using numpy to compute a self correlation function in a set of 3D vectors. I have a set of vectors in a 3d space

Solution 1:

The NumPy routines are for 1D arrays. As a "minimal" improvement, use a vectorized operation for the normalisation step (use of np.arange in the before-to-last line):

defvector_autocorrelate(t_array):
  n_vectors = len(t_array)
  # correlate each component indipendently
  acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
  # sum the correlations for each component
  acorr = np.sum(acorr, axis = 0)
  # divide by the number of values actually measured and return
  acorr /= (n_vectors - np.arange(n_vectors))
  return acorr

For larger array sizes, you should consider using the Fourier Transform algorithm for correlation. Check out the examples of the library tidynamics if you are interested in that (disclaimer: I wrote the library, it depends only on NumPy).

For reference, here are the timings for a NumPy-code (that I wrote for testing), your code vector_autocorrelate and tidynamics.

size = [2**i for i in range(4, 17, 2)]

np_time = []
ti_time = []
va_time = []
for s in size:
    data = np.random.random(size=(s, 3))
    t0 = time.time()
    correlate = np.array([np.correlate(data[:,i], data[:,i], mode='full') for i in range(data.shape[1])])[:,:s]
    correlate = np.sum(correlate, axis=0)/(s-np.arange(s))
    np_time.append(time.time()-t0)
    t0 = time.time()
    correlate = tidynamics.acf(data)
    ti_time.append(time.time()-t0)
    t0 = time.time()
    correlate = vector_autocorrelate(data)
    va_time.append(time.time()-t0)

You can see the results:

print("size", size)
print("np_time", np_time)
print("va_time", va_time)
print("ti_time", ti_time)

size [16, 64, 256, 1024, 4096, 16384, 65536]

np_time [0.00023794174194335938, 0.0002703666687011719, 0.0002713203430175781, 0.001544952392578125, 0.0278470516204834, 0.36094141006469727, 6.922360420227051]

va_time [0.00021696090698242188, 0.0001690387725830078, 0.000339508056640625, 0.0014629364013671875, 0.024930953979492188, 0.34442687034606934, 7.005630731582642]

ti_time [0.0011148452758789062, 0.0008449554443359375, 0.0007512569427490234, 0.0010488033294677734, 0.0026645660400390625, 0.007939338684082031, 0.048232316970825195]

or plot them

plt.plot(size, np_time)
plt.plot(size, va_time)
plt.plot(size, ti_time)
plt.loglog()

For anything but very small data series, the "N**2" algorithm is unusable.

Solution 2:

Here's my result. It is slower (about 4x) and delivers other results. Why do I post it anyway? I thought it's worth to see how to measure and what's the difference. If - in addition - anybody finds the reason for the different results, I'd be more then happy.

So, here's my solution:

%timeit [np.mean([np.dot(a[t], a[i+t]) for i in range(len(a)-t)]) for t in range(len(a))]

Result: 95.2 µs ± 3.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In comparison, your solution is about 4x faster:

%timeit vector_autocorrelate(a)

delivers: 24.8 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Solution 3:

Hi I faced a imilar issue. Here is my idea

def fast_vector_correlation(M):
    n_row = M.shape[0]
    dot_mat = M.dot(M.T)
    corr = [np.trace(dot_mat,offset=x) for x in range(n_row)]
    corr/=(n_row-np.arange(n_row))
    return corr

The idea is that dot_mat contains all the scalar product between the row vectors. To compute the correlation at different t values you have just to sum the diagonals (of the upper right riangular part), as show in the picture.

Solution 4:

I'm posting the answer here in case someone else will need it, since it took me quite some time to figure out a viable approach. I ended up solving this by defining the following function

defvector_autocorrelate(t_array):
  n_vectors = len(t_array)
  # correlate each component indipendently
  acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
  # sum the correlations for each component
  acorr = np.sum(acorr, axis = 0)
  # divide by the number of values actually measured and return
  acorr = np.array( [ val / (n_vectors - i) for i,val inenumerate(acorr)])
  return acorr

If anybody has a better idea, I would really like to hear that since I think the current one is still not as compact as it should be. It's better than nothing though, which is why I'm posting it here.

Post a Comment for "Computing Autocorrelation Of Vectors With Numpy"