Convert A Columns Of String To List In Pandas

June 08, 2024 Post a Comment

I have a problem with the type of one of my column in a pandas dataframe. Basically the column is saved in a csv file as a string, and I wanna use it as a tuple to be able to conve

Solution 1:

Use str.strip and str.split:

df['LABELS'] = df['LABELS'].str.strip('()').str.split(',')

But if no NaNs here, list comprehension working nice too:

df['LABELS'] = [x.strip('()').split(',') for x indf['LABELS']]

Solution 2:

You can use ast.literal_eval, which will give you a tuple:

import ast
df.LABELS = df.LABELS.apply(ast.literal_eval)

If you do want a list, use:

df.LABELS.apply(lambda s: list(ast.literal_eval(s)))

Solution 3:

Sorry I was late to the party. So for other latecomers I got this to work based on the above replies:

df['hashtags'] = df.apply(lambda row:  row['hashtags'].strip('[]').replace('"', '').replace(' ', '').split(',')   , axis=1)

I loaded a csv with some columns looking like this ...,['hashtag1','hashtag2'],... and the Panda DataFrame loaded it as a string object. I used the above code and it converted to list. I then used "explode" to flatten the data.

Solution 4:

You can try this (assuming your csv is called filename.csv):

df = pd.read_csv('filename.csv')

df['LABELS'] = df.LABELS.apply(lambda x: x.strip('()').split(','))

>>> df
   ID                               LABELS
0   1  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]
1   2  [1.0, 2.0, 2.0, 3.0, 3.0, 1.0, 4.0]

Solution 5:

Alternatively, you might consider regular expressions:

pattern = re.compile("[0-9]\.[0-9]")
df.LABELS.apply(pattern.findall)

Python Guru