Is There An Equivalent To R Apply Function In Python?
Solution 1:
The equivalent in NumPy is:
xx.sum(axis=2)
That is, you are summing over axis 2 (the last dimension), which as its length is 4, leaves the other two dimensions (2,3) as the shape of the result:
array([[4., 4., 4.],
[4., 4., 4.]])
Perhaps a more literal translation of your R code would be:
np.apply_over_axes(np.sum, xx, 2)
Which gives a similar result but transposed. This is likely to be slower, however, and is not idiomatic unless the actual operation you're performing is something more complicated than sum.
Solution 2:
np.apply_over_axes
is different from R's apply
in several ways.
First, np.apply_over_axes
needs collapsing axes to be specified,
whereas R's apply
needs remaining axes to be specified.
Secondly, np.apply_over_axes
applies function iteratively as the documentation stated below. The result is the same for np.sum
but it could be different for other functions.
func is called as res = func(a, axis), where axis is the first element of axes. The result res of the function call must have either the same dimensions as a or one less dimension. If res has one less dimension than a, a dimension is inserted before axis. The call to func is then repeated for each axis in axes, with res as the first argument.
And the func for np.apply_over_axes
needs to be in particular format and the return of func needs to be in particular shape for np.apply_over_axes
to perform correctly.
Here's an example how np.apply_over_axes
fails
>>> arr.shape
(5, 4, 3, 2)
>>> np.apply_over_axes(np.mean, arr, (0,1))
array([[[[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]]]])
>>> np_mean = lambda x: np.mean(x)
>>> np.apply_over_axes(np_mean, arr, (0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 5, in apply_over_axes
File "/Users/kwhkim/opt/miniconda3/envs/rtopython2-pip/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 495, in apply_over_axes
res = func(*args)
TypeError: <lambda>() takes 1 positional argument but 2 were given
Since there seems to be no equivalent function in Python,
I made a function that is similar to R's apply
def np_apply(arr, axes_remain, fun, *args, **kwargs):
axes_remain = tuple(set(axes_remain))
arr_shape = arr.shape
axes_to_move = set(range(len(arr.shape)))
for axis in axes_remain:
axes_to_move.remove(axis)
axes_to_move = tuple(axes_to_move)
arr, axes_to_move
arr2 = np.moveaxis(arr, axes_to_move, [-x for x in list(range(1,len(axes_to_move)+1))]).copy()
#if arr2.flags.c_contiguous:
arr2 = arr2.reshape([arr_shape[x] for x in axes_remain]+[-1])
return np.apply_along_axis(fun, -1, arr2, *args, **kwargs)
It works fine at least for the sample example as above(not exactly the same as the result above but math.close()
returns True for nearly all elements)
>>> np_apply(arr, (2,3), np.mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
>>> np_apply(arr, (2,3), np_mean)
array([[ 0.05856732, -0.14844212],
[ 0.34214183, 0.24319846],
[-0.04807454, 0.04752829]])
For the function to work smoothly for large multidimensional array, it needs to be optimized. For instance, array should be prevented from copying.
Anyway it seems to work as a proof-of-concept and I hope it helps.
PS)
arr
is generated by arr = np.random.normal(0,1,(5,4,3,2))
Post a Comment for "Is There An Equivalent To R Apply Function In Python?"