Parse Data From Column Using If's
I have a dataframe column that contains multiple different text qualifiers and I want to be able to set a new column that for each row checks if text is in each row and if so do th
Solution 1:
No need to use numpy, pandas has a few different options for this sort of operation.
import pandas as pd
defparse_row_col1(row):
result = ""if'TL~'in row.COL1:
result = row.COL1.split('TL~')[1].split('_SP~')[0]
elif'TB~'in row.COL1:
result = row.COL1.split('TB~')[1].split('_SP~')[0]
elif'PE~'in row.COL1:
result = row.COL1.split('PE~')[1].split('_BA~')[0]
return result
parse_res = pd.Series((parse_row_col1(curr) for curr in df.itertuples(index=False)))
This method, iterating over row tuples, isn't as fast as using numpy's select
, but should be far less complex when dealing with a large number of conditions. Not only that, but as @rpanai points out in his answer, select
can only handle mutually exclusive conditions, whereas the solution above functions regardless.
Solution 2:
IIUC this is a case where you can apply np.select
see doc
import numpy as np
import pandas as pd
from io import StringIO
txt ="""COL1
0 PB~Cucumber_IT~_TL~Vegatables_SP~
1 PB~Potato_IT~_TB~Starch_SP~
2 PB~Onion_IT~_PE~Vegatables_BA~"""
df = pd.read_csv(StringIO(txt),
delim_whitespace=True)
condList = [df["COL1"].str.contains("TL~"),
df["COL1"].str.contains("TB~"),
df["COL1"].str.contains("PE~")]
choiceList = [df["COL1"].str.split('TL~').str[1].str[:-4],
df["COL1"].str.split('TB~').str[1].str[:-4],
df["COL1"].str.split('PE~').str[1].str[:-4]]
df["COL2"] = np.select(condList, choiceList)
You have to be sure that the conditions are all mutually exclusive.
Post a Comment for "Parse Data From Column Using If's"