How To Vectorize A Function That Uses Both Row And Column Elements Of A Dataframe
I have two inputs in a dataframe, and I need to create an output that depends on both inputs (same row, different columns), but also on its previous value (same column, previous ro
Solution 1:
If I understand you right, you want to know how to compute column output
. You can do for example:
df['output_2'] = (df['input_1'] + df['input_2']).replace(1, np.nan).ffill().replace(2, 1).astype(int)
print(df)
Prints:
input_1 input_2 output output_2
00000101002000031111401115011160000701008010091111101111110111120111131111140111150111160000170100
Solution 2:
As you explained in the discussion above we have just two inputs loaded using pandas dataframe:
df=pd.DataFrame([[0,0], [0,1], [0,0], [1,1], [0,1], [0,1], [0,0], [0,1], [0,1], [1,1], [1,1], [0,1], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1]], columns=['input_1', 'input_2'])
We have to create outputs using following rules:
#1 if input_1 is one the output is one#2 if both inputs is zero the output is zero#3 if input_1 is zero and input_2 is one the output holds the previous value#4 the initial output value is zero
to generate outputs we can
- duplicate input_1 to the output
- update output with previous value if input_1 is zero and input_2 is one
because of the rules above we don't need to update the first output
df['output'] = df.input_1
for idx, row in df.iterrows():
if (idx > 0) and (row.input_1 == 0) and (row.input_2 == 1):
df.output[idx] = df.output[idx-1]
print(df)
The output is:
>>>print(df)
input_1 input_2 output
0 0 0 0
1 0 1 0
2 0 0 0
3 1 1 1
4 0 1 1
5 0 1 1
6 0 0 0
7 0 1 0
8 0 1 0
9 1 1 1
10 1 1 1
11 0 1 1
12 0 1 1
13 1 1 1
14 0 1 1
15 0 1 1
16 0 0 0
17 0 1 0
UPDATE1
The more fast way to do it is modification of formula proposed by @Andrej
df['output_2'] = (df['input_1'] + df['input_2'] * 2).replace(2, np.nan).ffill().replace(3, 1).astype(int)
Without modification his formula creates wrong output for input combination [1, 0]. It holds the previous output instead of setting it to 1.
UPDATE2
This just to compare results
df=pd.DataFrame([[0,0], [1,0], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1], [0,1], [1,1], [1,1], [0,1], [0,1], [1,1], [0,1], [0,1], [0,0], [0,1]], columns=['input_1', 'input_2'])
df['output'] = df.input_1
for idx, row in df.iterrows():
if (idx > 0) and (row.input_1 == 0) and (row.input_2 == 1):
df.output[idx] = df.output[idx-1]
df['output_1'] = (df['input_1'] + df['input_2'] * 2).replace(2, np.nan).ffill().replace(3, 1).astype(int)
df['output_2'] = (df['input_1'] + df['input_2']).replace(1, np.nan).ffill().replace(2, 1).astype(int)
print(df)
The results is:
>>>print(df)
input_1 input_2 output output_1 output_2
0 0 0 0 0 0
1 1 0 1 1 0
2 0 1 1 1 0
3 1 1 1 1 1
4 0 1 1 1 1
5 0 1 1 1 1
6 0 0 0 0 0
7 0 1 0 0 0
8 0 1 0 0 0
9 1 1 1 1 1
10 1 1 1 1 1
11 0 1 1 1 1
12 0 1 1 1 1
13 1 1 1 1 1
14 0 1 1 1 1
15 0 1 1 1 1
16 0 0 0 0 0
17 0 1 0 0 0
Post a Comment for "How To Vectorize A Function That Uses Both Row And Column Elements Of A Dataframe"