Computing Autocorrelation Of Vectors With Numpy
Solution 1:
The NumPy routines are for 1D arrays. As a "minimal" improvement, use a vectorized operation for the normalisation step (use of np.arange
in the before-to-last line):
defvector_autocorrelate(t_array):
n_vectors = len(t_array)
# correlate each component indipendently
acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
# sum the correlations for each component
acorr = np.sum(acorr, axis = 0)
# divide by the number of values actually measured and return
acorr /= (n_vectors - np.arange(n_vectors))
return acorr
For larger array sizes, you should consider using the Fourier Transform algorithm for correlation. Check out the examples of the library tidynamics if you are interested in that (disclaimer: I wrote the library, it depends only on NumPy).
For reference, here are the timings for a NumPy-code (that I wrote for testing), your code vector_autocorrelate and tidynamics.
size = [2**i for i in range(4, 17, 2)]
np_time = []
ti_time = []
va_time = []
for s in size:
data = np.random.random(size=(s, 3))
t0 = time.time()
correlate = np.array([np.correlate(data[:,i], data[:,i], mode='full') for i in range(data.shape[1])])[:,:s]
correlate = np.sum(correlate, axis=0)/(s-np.arange(s))
np_time.append(time.time()-t0)
t0 = time.time()
correlate = tidynamics.acf(data)
ti_time.append(time.time()-t0)
t0 = time.time()
correlate = vector_autocorrelate(data)
va_time.append(time.time()-t0)
You can see the results:
print("size", size)
print("np_time", np_time)
print("va_time", va_time)
print("ti_time", ti_time)
size [16, 64, 256, 1024, 4096, 16384, 65536]
np_time [0.00023794174194335938, 0.0002703666687011719, 0.0002713203430175781, 0.001544952392578125, 0.0278470516204834, 0.36094141006469727, 6.922360420227051]
va_time [0.00021696090698242188, 0.0001690387725830078, 0.000339508056640625, 0.0014629364013671875, 0.024930953979492188, 0.34442687034606934, 7.005630731582642]
ti_time [0.0011148452758789062, 0.0008449554443359375, 0.0007512569427490234, 0.0010488033294677734, 0.0026645660400390625, 0.007939338684082031, 0.048232316970825195]
or plot them
plt.plot(size, np_time)
plt.plot(size, va_time)
plt.plot(size, ti_time)
plt.loglog()
For anything but very small data series, the "N**2" algorithm is unusable.
Solution 2:
Here's my result. It is slower (about 4x) and delivers other results. Why do I post it anyway? I thought it's worth to see how to measure and what's the difference. If - in addition - anybody finds the reason for the different results, I'd be more then happy.
So, here's my solution:
%timeit [np.mean([np.dot(a[t], a[i+t]) for i in range(len(a)-t)]) for t in range(len(a))]
Result: 95.2 µs ± 3.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In comparison, your solution is about 4x faster:
%timeit vector_autocorrelate(a)
delivers: 24.8 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Solution 3:
Hi I faced a imilar issue. Here is my idea
def fast_vector_correlation(M):
n_row = M.shape[0]
dot_mat = M.dot(M.T)
corr = [np.trace(dot_mat,offset=x) for x in range(n_row)]
corr/=(n_row-np.arange(n_row))
return corr
The idea is that dot_mat
contains all the scalar product between the row vectors. To compute the correlation at different t
values you have just to sum the diagonals (of the upper right riangular part), as show in the picture.
Solution 4:
I'm posting the answer here in case someone else will need it, since it took me quite some time to figure out a viable approach. I ended up solving this by defining the following function
defvector_autocorrelate(t_array):
n_vectors = len(t_array)
# correlate each component indipendently
acorr = np.array([np.correlate(t_array[:,i],t_array[:,i],'full') for i in xrange(3)])[:,n_vectors-1:]
# sum the correlations for each component
acorr = np.sum(acorr, axis = 0)
# divide by the number of values actually measured and return
acorr = np.array( [ val / (n_vectors - i) for i,val inenumerate(acorr)])
return acorr
If anybody has a better idea, I would really like to hear that since I think the current one is still not as compact as it should be. It's better than nothing though, which is why I'm posting it here.
Post a Comment for "Computing Autocorrelation Of Vectors With Numpy"