Split The Data Frame Based On Consecutive Row Values Differences
I have a data frame like this, df col1 col2 col3 1 2 3 2 5 6 7 8 9 10 11 12 11 12 13 13 14 15 14 15
Solution 1:
You could define a custom grouper by taking the diff
, checking when it is greater than 1
, and take the cumsum
of the boolean series. Then group by the result and build a dictionary from the groupby object:
d = dict(tuple(df.groupby(df.col1.diff().gt(1).cumsum())))
print(d[0])
col1 col2 col3
0 1 2 3
1 2 5 6
print(d[1])
col1 col2 col3
2 7 8 9
A more detailed break-down:
df.assign(difference=(diff:=df.col1.diff()),
condition=(gt1:=diff.gt(1)),
grouper=gt1.cumsum())
col1 col2 col3 difference condition grouper
0 1 2 3 NaN False 0
1 2 5 6 1.0 False 0
2 7 8 9 5.0 True 1
3 10 11 12 3.0 True 2
4 11 12 13 1.0 False 2
5 13 14 15 2.0 True 3
6 14 15 16 1.0 False 3
Solution 2:
You can also peel off the target column and work with it as a series, rather than the above answer. That keeps everything smaller. It runs faster on the example, but I don't know how they'll scale up, depending how many times you're splitting.
row_bool = df['col1'].diff()>1
split_inds, = np.where(row_bool)
split_inds = np.insert(arr=split_inds, obj=[0,len(split_inds)], values=[0,len(df)])
df_tup = ()
for n in range(0,len(split_inds)-1):
tempdf = df.iloc[split_inds[n]:split_inds[n+1],:]
df_tup.append(tempdf)
(Just throwing it in a tuple of dataframes afterward, but the dictionary approach might be better?)
Post a Comment for "Split The Data Frame Based On Consecutive Row Values Differences"