Skip to content Skip to sidebar Skip to footer

Creating New Rows From Single Cell Strings In Pandas Dataframe

I have a pandas dataframe with output scraped directly from a USDA text file. Below is an example of of the dataframe: Date Region CommodityGroup

Solution 1:

  • Use .str.split to split the column with a pattern ',| and ', which is ',' or ' and '. '|' is OR.
  • Use .explode to separate list elements into separate rows
    • Optionally, use .reset_index(drop=True) after explode, depending on your needs.
      • df = df.explode('CommodityGroup').reset_index(drop=True)
import pandas as pd

# data
data = {'Date': ['1/2/2019', '1/2/2019', '1/2/2019'],
        'Region': ['Mexico Crossings', 'Eastern North Carolina', 'Michigan'],
        'CommodityGroup': ['Beans,Cucumbers,Eggplant,Melons', 'Apples and Pears', 'Apples'],
        'InboundCity': ['Atlanta', 'Baltimore', 'Boston'],
        'Low': [4500, 7000, 3800],
        'High': [4700, 8000, 4000]}

# create the dataframe
df = pd.DataFrame(data)

# split the CommodityGroup strings
df.CommodityGroup = df.CommodityGroup.str.split(',| and ')

# explode the CommodityGroup lists
df = df.explode('CommodityGroup')

# final
       Date                  Region CommodityGroup InboundCity   Low  High
0  1/2/2019        Mexico Crossings          Beans     Atlanta  4500  4700
0  1/2/2019        Mexico Crossings      Cucumbers     Atlanta  4500  4700
0  1/2/2019        Mexico Crossings       Eggplant     Atlanta  4500  4700
0  1/2/2019        Mexico Crossings         Melons     Atlanta  4500  4700
1  1/2/2019  Eastern North Carolina         Apples   Baltimore  7000  8000
1  1/2/2019  Eastern North Carolina          Pears   Baltimore  7000  8000
2  1/2/2019                Michigan         Apples      Boston  3800  4000

Solution 2:

You can try this:

df = df.set_index(['Date', 'Region', 'InboundCity', 'Low', 'High'])
   .apply(lambda x: x.str.split(',| and ').explode())
   .reset_index() 
print(df)

       Date                  Region InboundCity   Low  High CommodityGroup
0  1/2/2019        Mexico Crossings     Atlanta  4500  4700          Beans
1  1/2/2019        Mexico Crossings     Atlanta  4500  4700      Cucumbers
2  1/2/2019        Mexico Crossings     Atlanta  4500  4700       Eggplant
3  1/2/2019        Mexico Crossings     Atlanta  4500  4700         Melons
4  1/2/2019  Eastern North Carolina   Baltimore  7000  8000         Apples
5  1/2/2019  Eastern North Carolina   Baltimore  7000  8000          Pears
6  1/2/2019                Michigan      Boston  3800  4000         Apples

Post a Comment for "Creating New Rows From Single Cell Strings In Pandas Dataframe"