Skip to content Skip to sidebar Skip to footer

Python Dataframe Set True If Last N Rows Are True

I want to create a new column where in, True if last n rows are True in other column. It is running perfectly as I wanted. The problem is it is taking lot of time. dfx = pd.DataFra

Solution 1:

Use pandas.Series.rolling:

n = 2
dfx["A"].rolling(n).sum().eq(n)

Output:

0False1False2False3False4False5True6True7True8False9FalseName:A,dtype:bool

Benchmark against OP (about 1000x faster):

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]*1000}) 

%timeit -n10 l1 = dfx["A"].rolling(n).sum().eq(n)
# 702 µs ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n10 l2 = [False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]
# 908 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

l1.tolist() == l2
# True

Post a Comment for "Python Dataframe Set True If Last N Rows Are True"