Skip to content Skip to sidebar Skip to footer

Performance Of Xtensor Types Vs. NumPy For Simple Reduction

I was trying out xtensor-python and started by writing a very simple sum function, after using the cookiecutter setup and enabling SIMD intrinsics with xsimd. inline double sum_py

Solution 1:

wow this is a coincidence! I am working on exactly this speedup!

xtensor's sum is a lazy operation -- and it doesn't use the most performant iteration order for (auto-)vectorization. However, we just added a evaluation_strategy parameter to reductions (and the upcoming accumulations) which allows you to select between immediate and lazy reductions.

Immediate reductions perform the reduction immediately (and not lazy) and can use a iteration order optimized for vectorized reductions.

You can find this feature in this PR: https://github.com/QuantStack/xtensor/pull/550

In my benchmarks this should be at least as fast or faster than numpy. I hope to get it merged today.

Btw. please don't hesitate to drop by our gitter channel and post a link to the question, we need to monitor StackOverflow better: https://gitter.im/QuantStack/Lobby


Post a Comment for "Performance Of Xtensor Types Vs. NumPy For Simple Reduction"