Change With Nan If Values Stuck At A Single Value Over Time Using Python
As you can see below, my contains some identical consecutive values, i.e. 1, 2, and 3. Date Value 0 2017-07-18 07:40:00 1 1 2017-07-18 07:45:00 1 2 2017-07-18 07:50:0
Solution 1:
You could GroupBy
consecutive values using a custom grouping scheme, check which groups have a size greater or equal to 3
and use the result to index the dataframe and set the rows of interest to NaN
:
g=df.Value.diff().fillna(0).ne(0).cumsum()m=df.groupby(g).Value.transform('size').ge(3)df.loc[m,'Value']=np.nanDateValue02017-07-18-07:40:00NaN12017-07-18-07:45:00NaN22017-07-18-07:50:00NaN32017-07-18-07:55:002414.042017-07-18-08:00:002.052017-07-18-08:05:002.062017-07-18-08:10:004416.072017-07-18-08:15:004416.082017-07-18-08:20:00NaN92017-07-18-08:25:00NaN102017-07-18-08:30:00NaN112017-07-18-08:35:006998.0
Where:
df.assign(grouper=g,mask=m,result=df_.Value)DateValuegroupermaskresult02017-07-18-07:40:0010TrueNaN12017-07-18-07:45:0010TrueNaN22017-07-18-07:50:0010TrueNaN32017-07-18-07:55:002414 1False2414.042017-07-18-08:00:0022False2.052017-07-18-08:05:0022False2.062017-07-18-08:10:004416 3False4416.072017-07-18-08:15:004416 3False4416.082017-07-18-08:20:0034TrueNaN92017-07-18-08:25:0034TrueNaN102017-07-18-08:30:0034TrueNaN112017-07-18-08:35:006998 5False6998.0
Solution 2:
Count the values. The result is a series, it needs a name for further references:
counts = df['Value'].value_counts()
counts.name = '_'
Merge the select values from the series with the original dataframe:
keep = counts[counts < 3]
df.merge(keep, left_on='Value', right_index=True)[df.columns]
# Date Value#3 2017-07-18 07:55:00 2414#4 2017-07-18 08:00:00 2#5 2017-07-18 08:05:00 2#6 2017-07-18 08:10:00 4416#7 2017-07-18 08:15:00 4416#11 2017-07-18 08:35:00 6998
The result is a filtered dataframe.
If you use pandas version <0.24, you should upgrade, but here is a workaround:
df.merge(pd.DataFrame(keep), left_on='Value', right_index=True)[df.columns]
Post a Comment for "Change With Nan If Values Stuck At A Single Value Over Time Using Python"