Skip to content Skip to sidebar Skip to footer

Pandas Read_csv With Final Column Containing Commas

So I have a csv dataset that by my book is well formed, and I'm trying to get the pandas package to load it correctly. The header consists of 5 column names , but the final column

Solution 1:

If it feasible for you to replace { with "{, and } with }", it can be read correctly by: pd.read_csv('data/training.dat',quotechar='"',skipinitialspace=True)

Edit:

Or go for a regular expression based solution:

In [205]:
print pd.read_csv('a.data',sep=",(?![^{]*\})", header=None)
   012340  A  B  C  D              E
11234  {K1:V1,K2:V2}

[2 rows x 5 columns]

Solution 2:

I think it depends on what you're trying to do with the JSON. If you just want to ignore it, probably the easiest way is to set the comment char to { (For this and the next, I've assumed you don't have any braces in your other columns.)

pd.read_csv(
    'woo.csv',
    comment='{' 
)

It is possible to extract elements from the JSON using a custom separator with read_csv, though I'm not at all convinced this is a sensible approach. Pandas will turn the separator into a column if it's a capturing group (it uses re.split internally), so I can get a column containing the JSON. Unfortunately I get a load of empty columns too because of that; hence the dropna.

I sent the JSON through loads and dumps, though obviously you'd want to do something more sensible. :)

json_bit = lambda x: json.dumps(json.loads(x))

pd.read_csv(
    'woo.csv', 
    sep=r'(\{.*\})$|,', 
    converters={'None.3': json_bit}
).dropna(axis=1)

Sample CSV

A,B,C,D,E
1,2,3,4,{"K1":"V1","K2":"V2"}
3,2,3,4,{"K1": "V1", "k£": {"k3": "v3"},  "K2":"V2"}

Solution 3:

No need to preprocess csv file, just use engine type python :

dataset = pd.read_csv('sample.csv', sep=',', engine='python')

Post a Comment for "Pandas Read_csv With Final Column Containing Commas"