Skip to content Skip to sidebar Skip to footer

How To Vectorize A Function That Uses Both Row And Column Elements Of A Dataframe

I have two inputs in a dataframe, and I need to create an output that depends on both inputs (same row, different columns), but also on its previous value (same column, previous ro

Solution 1:

If I understand you right, you want to know how to compute column output. You can do for example:

df['output_2'] = (df['input_1'] + df['input_2']).replace(1, np.nan).ffill().replace(2, 1).astype(int)
print(df)

Prints:

    input_1  input_2  output  output_2
00000101002000031111401115011160000701008010091111101111110111120111131111140111150111160000170100

Solution 2:

As you explained in the discussion above we have just two inputs loaded using pandas dataframe:

df=pd.DataFrame([[0,0], [0,1], [0,0], [1,1], [0,1], [0,1], [0,0], [0,1], [0,1], [1,1], [1,1], [0,1], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1]], columns=['input_1', 'input_2'])

We have to create outputs using following rules:

#1 if input_1 is one the output is one#2 if both inputs is zero the output is zero#3 if input_1 is zero and input_2 is one the output holds the previous value#4 the initial output value is zero

to generate outputs we can

  1. duplicate input_1 to the output
  2. update output with previous value if input_1 is zero and input_2 is one

because of the rules above we don't need to update the first output

df['output'] = df.input_1

for idx, row in df.iterrows():
   if (idx > 0) and (row.input_1 == 0) and (row.input_2 == 1):
       df.output[idx] = df.output[idx-1]

print(df)

The output is:

>>>print(df)
    input_1  input_2  output
0         0        0       0
1         0        1       0
2         0        0       0
3         1        1       1
4         0        1       1
5         0        1       1
6         0        0       0
7         0        1       0
8         0        1       0
9         1        1       1
10        1        1       1
11        0        1       1
12        0        1       1
13        1        1       1
14        0        1       1
15        0        1       1
16        0        0       0
17        0        1       0

UPDATE1

The more fast way to do it is modification of formula proposed by @Andrej

df['output_2'] = (df['input_1'] + df['input_2'] * 2).replace(2, np.nan).ffill().replace(3, 1).astype(int)

Without modification his formula creates wrong output for input combination [1, 0]. It holds the previous output instead of setting it to 1.

UPDATE2

This just to compare results

df=pd.DataFrame([[0,0], [1,0], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1], [0,1], [1,1], [1,1], [0,1], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1]], columns=['input_1', 'input_2'])

df['output'] = df.input_1
for idx, row in df.iterrows():
   if (idx > 0) and (row.input_1 == 0) and (row.input_2 == 1):
       df.output[idx] = df.output[idx-1]

df['output_1'] = (df['input_1'] + df['input_2'] * 2).replace(2, np.nan).ffill().replace(3, 1).astype(int)
df['output_2'] = (df['input_1'] + df['input_2']).replace(1, np.nan).ffill().replace(2, 1).astype(int)
print(df)

The results is:

>>>print(df)
    input_1  input_2  output  output_1  output_2
0         0        0       0         0         0
1         1        0       1         1         0
2         0        1       1         1         0
3         1        1       1         1         1
4         0        1       1         1         1
5         0        1       1         1         1
6         0        0       0         0         0
7         0        1       0         0         0
8         0        1       0         0         0
9         1        1       1         1         1
10        1        1       1         1         1
11        0        1       1         1         1
12        0        1       1         1         1
13        1        1       1         1         1
14        0        1       1         1         1
15        0        1       1         1         1
16        0        0       0         0         0
17        0        1       0         0         0

Post a Comment for "How To Vectorize A Function That Uses Both Row And Column Elements Of A Dataframe"