Pandas Replace Only Part Of A Column

January 28, 2024 Post a Comment

Here is my input: import pandas as pd import numpy as np list1 = [10,79,6,38,4,557,12,220,46,22,45,22] list2 = [4,3,23,6,234,47,312,2,426,42,435,23] df = pd.DataFrame({'A' : list

Solution 1:

EDIT:

Faster version (thanks to b2002):

ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) if dd[i-1] > 2]
jj = [ii[0]] + jj

for ci in jj:
    df.C.values[ci:ci+3] = 1.0

First get the indices of all your starting points, i.e. all your points that are 1.0 and have two NaN following, by looking at the differences between the points that are not null in the C column (first index is included by default), then iterate over those indices and use loc to change slices of your C column:

ii = df[pd.notnull(df.C)].index
dd = np.diff(ii)
jj = [ii[i] for i in range(1,len(ii)) ifdd[i-1] > 2]
jj = [ii[0]] + jj

for ci in jj:
    df.loc[ci:ci+2,'C'] = 1.0

Result:

      A    B    C
0104NaN17931.026231.033861.044234NaN5557471.06123121.0722021.0846426NaN92242NaN1045435NaN112223NaN

Solution 2:

list1 = [10,79,6,38,4,557,12,220,46,22,45,22]
list2 = [4,3,23,6,234,47,312,2,426,42,435,23]

df = pd.DataFrame({'A' : list1, 'B' : list2}, columns = ['A', 'B'])
df['C'] = np.where (df['A'] > df['B'].shift(-2), 1, np.nan)

      A    B    C
0    10    4  NaN
1    79    3  1.0
2     6   23  NaN
3    38    6  NaN
4     4  234  NaN
5   557   47  1.0
6    12  312  NaN
7   220    2  1.0
8    46  426  NaN
9    22   42  NaN
10   45  435  NaN
11   22   23  NaN

make an array from sequence:

a = np.array(df.C)

This function will test segments of an array for matching patterns and will replace segments which match with another pattern. Previously matched segments will not be considered for future matches (the filler numbers are greater than one).

deffill_segments(a, test_patterns, fill_patterns):
    # replace nans with zeros so fast numpy array_equal will work
    nan_idx = np.where(np.isnan(a))[0]
    np.put(a, nan_idx, 0.)
    col_index = list(np.arange(a.size))
    # loop forward through sequence comparing segment patternsfor j in np.arange(len(test_patterns)):
        this_pattern = test_patterns[j]
        snip = len(this_pattern)
        rng = col_index[:-snip + 1]
        for i in rng:
            seg = a[col_index[i: i + snip]]
            if np.array_equal(seg, this_pattern):
                # when a match is found, replace values in array segment# with fill pattern
                pattern_indexes = col_index[i: i + snip]
                np.put(a, pattern_indexes, fill_patterns[j])
    # convert all fillers to ones
    np.put(a, np.where(a > 1.)[0], 1.)
    # convert zeros back to nans
    np.put(a, np.where(a == 0.)[0], np.nan)

    return a

Patterns to be replaced:

p1 = [1., 1., 1.]
p2 = [1., 0., 1.]
p3 = [1., 1., 0.]
p4 = [1., 0., 0.]

And corresponding fill patterns:

f1 = [5., 5., 5.]
f2 = [4., 4., 4.]
f3 = [3., 3., 3.]
f4 = [2., 2., 2.]

make test_patterns and fill_patterns inputs

patterns = [p1, p2, p3, p4]
fills = [f1, f2, f3, f4]

run function:

a = fill_segments(a, patterns, fills)

assign a to column C

df.C = a

df:

      A    B    C
0104NaN17931.026231.033861.044234NaN5557471.06123121.0722021.0846426NaN92242NaN1045435NaN112223NaN

The patterns and fills may need to be adjusted/added to depending on the way the input column is initially populated and the specific result sequence rules.

Python Guru

Pandas Replace Only Part Of A Column

Solution 1:

Solution 2:

Post a Comment for "Pandas Replace Only Part Of A Column"