Creating New Rows From Single Cell Strings In Pandas Dataframe
I have a pandas dataframe with output scraped directly from a USDA text file. Below is an example of of the dataframe: Date Region CommodityGroup
Solution 1:
- Use
.str.split
to split the column with a pattern',| and '
, which is','
or' and '
.'|'
isOR
. - Use
.explode
to separate list elements into separate rows- Optionally, use
.reset_index(drop=True)
after explode, depending on your needs.df = df.explode('CommodityGroup').reset_index(drop=True)
- Optionally, use
import pandas as pd
# data
data = {'Date': ['1/2/2019', '1/2/2019', '1/2/2019'],
'Region': ['Mexico Crossings', 'Eastern North Carolina', 'Michigan'],
'CommodityGroup': ['Beans,Cucumbers,Eggplant,Melons', 'Apples and Pears', 'Apples'],
'InboundCity': ['Atlanta', 'Baltimore', 'Boston'],
'Low': [4500, 7000, 3800],
'High': [4700, 8000, 4000]}
# create the dataframe
df = pd.DataFrame(data)
# split the CommodityGroup strings
df.CommodityGroup = df.CommodityGroup.str.split(',| and ')
# explode the CommodityGroup lists
df = df.explode('CommodityGroup')
# final
Date Region CommodityGroup InboundCity Low High
0 1/2/2019 Mexico Crossings Beans Atlanta 4500 4700
0 1/2/2019 Mexico Crossings Cucumbers Atlanta 4500 4700
0 1/2/2019 Mexico Crossings Eggplant Atlanta 4500 4700
0 1/2/2019 Mexico Crossings Melons Atlanta 4500 4700
1 1/2/2019 Eastern North Carolina Apples Baltimore 7000 8000
1 1/2/2019 Eastern North Carolina Pears Baltimore 7000 8000
2 1/2/2019 Michigan Apples Boston 3800 4000
Solution 2:
You can try this:
df = df.set_index(['Date', 'Region', 'InboundCity', 'Low', 'High'])
.apply(lambda x: x.str.split(',| and ').explode())
.reset_index()
print(df)
Date Region InboundCity Low High CommodityGroup
0 1/2/2019 Mexico Crossings Atlanta 4500 4700 Beans
1 1/2/2019 Mexico Crossings Atlanta 4500 4700 Cucumbers
2 1/2/2019 Mexico Crossings Atlanta 4500 4700 Eggplant
3 1/2/2019 Mexico Crossings Atlanta 4500 4700 Melons
4 1/2/2019 Eastern North Carolina Baltimore 7000 8000 Apples
5 1/2/2019 Eastern North Carolina Baltimore 7000 8000 Pears
6 1/2/2019 Michigan Boston 3800 4000 Apples
Post a Comment for "Creating New Rows From Single Cell Strings In Pandas Dataframe"