How To Remove Carriage Return In A Dataframe
I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' att
Solution 1:
Another solution is use str.strip
:
df['29'] = df['29'].str.strip(r'\\r')
print df
id290 location Uttar Pradesh
1 country_name India
2 total_deaths 20
If you want use replace
, add r
and one \
:
print df.replace({r'\\r': ''}, regex=True)
id290 location Uttar Pradesh
1 country_name India
2 total_deaths 20
In replace
you can define column for replacing like:
print df
id290 location Uttar Pradesh\r
1 country_name India
2 total_deaths\r 20print df.replace({'29': {r'\\r': ''}}, regex=True)
id290 location Uttar Pradesh
1 country_name India
2 total_deaths\r 20print df.replace({r'\\r': ''}, regex=True)
id290 location Uttar Pradesh
1 country_name India
2 total_deaths 20
EDIT by comment:
import pandas as pd
df = pd.read_csv('data_source_test.csv')
print df
id country_name location total_deaths
01 India New Delhi 35412 India Tamil Nadu 4823 India Karnataka 034 India Andra Pradesh 3245 India Assam 67956 India Kerala 12867 India Punjab 078 India Mumbai, Thane 189 India Uttar Pradesh\r\n 20910 India Orissa 69print df.replace({r'\r\n': ''}, regex=True)
id country_name location total_deaths
01 India New Delhi 35412 India Tamil Nadu 4823 India Karnataka 034 India Andra Pradesh 3245 India Assam 67956 India Kerala 12867 India Punjab 078 India Mumbai, Thane 189 India Uttar Pradesh 20910 India Orissa 69
If need replace only in column location
:
df['location'] = df.location.str.replace(r'\r\n', '')
print df
id country_name location total_deaths
01 India New Delhi 35412 India Tamil Nadu 4823 India Karnataka 034 India Andra Pradesh 3245 India Assam 67956 India Kerala 12867 India Punjab 078 India Mumbai, Thane 189 India Uttar Pradesh 20910 India Orissa 69
Solution 2:
use str.replace
, you need to escape the sequence so it treats it as a carriage return rather than the literal \r
:
In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df
Out[15]:
id290 location Uttar Pradesh
1 country_name India
2 total_deaths 20
Solution 3:
The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a
df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)
Solution 4:
Somehow, the accepted answer did not work for me. Ultimately, I found the solution by doing it like followed
df["29"] = df["29"].replace(r'\r', '', regex=True)
The difference is that I use \r
instead of \\r
.
Solution 5:
Just make df equal to the df.replace code line and then print df.
df=df.replace({'\r': ''}, regex=True)
print(df)
Post a Comment for "How To Remove Carriage Return In A Dataframe"