Skip to content Skip to sidebar Skip to footer

Splitting A Column Into Multiple Columns With Specific Name In Pandas Dataframe

I have following dataframe: pri sec TOM AB,CD,EF JACK XY,YZ HARRY FG NICK KY,NY,SD,EF,FR I need following output with column names as following(based on how many , sepa

Solution 1:

Use join + split + add_prefix:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  NoneNone1   JACK           XY,YZ   XY    YZ  NoneNoneNone2  HARRY              FG   FG  NoneNoneNoneNone3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

And if need NaNs add fillna:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
     pri             sec sec0 sec1 sec2 sec3 sec4
0    TOM        AB,CD,EF   AB   CD   EF  NaN  NaN
1   JACK           XY,YZ   XY   YZ  NaN  NaN  NaN
2  HARRY              FG   FG  NaN  NaN  NaN  NaN
3   NICK  KY,NY,SD,EF,FR   KY   NY   SD   EF   FR

Solution 2:

Try following code (explanations as comments). It finds max length of items in "sec" column and creates names accordingly:

maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x)   for x inrange(maxlen)]      # create new column names 
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec" 
newdf = pd.DataFrame(data=datalist, columns=cols)   # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1)              # add it to original dataframeprint(newdf)

Output:

     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  NoneNone1   JACK           XY,YZ   XY    YZ  NoneNoneNone2  HARRY              FG   FG  NoneNoneNoneNone3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

Post a Comment for "Splitting A Column Into Multiple Columns With Specific Name In Pandas Dataframe"