Pandas Replace Only Part Of A Column
Solution 1:
EDIT:
Faster version (thanks to b2002):
ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj
for ci in jj:
df.C.values[ci:ci+3] = 1.0
First get the indices of all your starting points, i.e. all your points that are 1.0 and have two NaN following, by looking at the differences between the points that are not null in the C
column (first index is included by default), then iterate over those indices and use loc
to change slices of your C
column:
ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) ifdd[i-1] > 2]
jj = [ii[0]] + jj
for ci in jj:
df.loc[ci:ci+2,'C'] = 1.0
Result:
A B C
0104NaN17931.026231.033861.044234NaN5557471.06123121.0722021.0846426NaN92242NaN1045435NaN112223NaN
Solution 2:
list1 = [10,79,6,38,4,557,12,220,46,22,45,22]
list2 = [4,3,23,6,234,47,312,2,426,42,435,23]
df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where (df['A'] > df['B'].shift(-2), 1, np.nan)
A B C
0 10 4 NaN
1 79 3 1.0
2 6 23 NaN
3 38 6 NaN
4 4 234 NaN
5 557 47 1.0
6 12 312 NaN
7 220 2 1.0
8 46 426 NaN
9 22 42 NaN
10 45 435 NaN
11 22 23 NaN
make an array from sequence:
a = np.array(df.C)
This function will test segments of an array for matching patterns and will replace segments which match with another pattern. Previously matched segments will not be considered for future matches (the filler numbers are greater than one).
deffill_segments(a, test_patterns, fill_patterns):
# replace nans with zeros so fast numpy array_equal will work
nan_idx = np.where(np.isnan(a))[0]
np.put(a, nan_idx, 0.)
col_index = list(np.arange(a.size))
# loop forward through sequence comparing segment patternsfor j in np.arange(len(test_patterns)):
this_pattern = test_patterns[j]
snip = len(this_pattern)
rng = col_index[:-snip + 1]
for i in rng:
seg = a[col_index[i: i + snip]]
if np.array_equal(seg, this_pattern):
# when a match is found, replace values in array segment# with fill pattern
pattern_indexes = col_index[i: i + snip]
np.put(a, pattern_indexes, fill_patterns[j])
# convert all fillers to ones
np.put(a, np.where(a > 1.)[0], 1.)
# convert zeros back to nans
np.put(a, np.where(a == 0.)[0], np.nan)
return a
Patterns to be replaced:
p1 = [1., 1., 1.]
p2 = [1., 0., 1.]
p3 = [1., 1., 0.]
p4 = [1., 0., 0.]
And corresponding fill patterns:
f1 = [5., 5., 5.]
f2 = [4., 4., 4.]
f3 = [3., 3., 3.]
f4 = [2., 2., 2.]
make test_patterns and fill_patterns inputs
patterns = [p1, p2, p3, p4]
fills = [f1, f2, f3, f4]
run function:
a = fill_segments(a, patterns, fills)
assign a to column C
df.C = a
df:
A B C
0104NaN17931.026231.033861.044234NaN5557471.06123121.0722021.0846426NaN92242NaN1045435NaN112223NaN
The patterns and fills may need to be adjusted/added to depending on the way the input column is initially populated and the specific result sequence rules.
Post a Comment for "Pandas Replace Only Part Of A Column"