Pandas Read_csv With Final Column Containing Commas
Solution 1:
If it feasible for you to replace {
with "{
, and }
with }"
, it can be read correctly by: pd.read_csv('data/training.dat',quotechar='"',skipinitialspace=True)
Edit:
Or go for a regular expression based solution:
In [205]:
print pd.read_csv('a.data',sep=",(?![^{]*\})", header=None)
012340 A B C D E
11234 {K1:V1,K2:V2}
[2 rows x 5 columns]
Solution 2:
I think it depends on what you're trying to do with the JSON. If you just want to ignore it, probably the easiest way is to set the comment char to {
(For this and the next, I've assumed you don't have any braces in your other columns.)
pd.read_csv(
'woo.csv',
comment='{'
)
It is possible to extract elements from the JSON using a custom separator with read_csv
, though I'm not at all convinced this is a sensible approach. Pandas will turn the separator into a column if it's a capturing group (it uses re.split
internally), so I can get a column containing the JSON. Unfortunately I get a load of empty columns too because of that; hence the dropna
.
I sent the JSON through loads
and dumps
, though obviously you'd want to do something more sensible. :)
json_bit = lambda x: json.dumps(json.loads(x))
pd.read_csv(
'woo.csv',
sep=r'(\{.*\})$|,',
converters={'None.3': json_bit}
).dropna(axis=1)
Sample CSV
A,B,C,D,E
1,2,3,4,{"K1":"V1","K2":"V2"}
3,2,3,4,{"K1": "V1", "k£": {"k3": "v3"}, "K2":"V2"}
Solution 3:
No need to preprocess csv file, just use engine type python :
dataset = pd.read_csv('sample.csv', sep=',', engine='python')
Post a Comment for "Pandas Read_csv With Final Column Containing Commas"