How To Remove Carriage Return In A Dataframe

February 22, 2024 Post a Comment

I am having a dataframe that contains columns named id, country_name, location and total_deaths. While doing data cleaning process, I came across a value in a row that has '\r' att

Solution 1:

Another solution is use str.strip:

df['29'] = df['29'].str.strip(r'\\r')
print df
             id290      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

If you want use replace, add r and one \:

print df.replace({r'\\r': ''}, regex=True)
             id290      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

In replace you can define column for replacing like:

print df
               id290        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20print df.replace({'29': {r'\\r': ''}}, regex=True)
               id290        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20print df.replace({r'\\r': ''}, regex=True)
             id290      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

EDIT by comment:

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
01        India          New Delhi           35412        India         Tamil Nadu            4823        India          Karnataka             034        India      Andra Pradesh            3245        India              Assam           67956        India             Kerala           12867        India             Punjab             078        India      Mumbai, Thane             189        India  Uttar Pradesh\r\n            20910        India             Orissa            69print df.replace({r'\r\n': ''}, regex=True)
   id country_name       location  total_deaths
01        India      New Delhi           35412        India     Tamil Nadu            4823        India      Karnataka             034        India  Andra Pradesh            3245        India          Assam           67956        India         Kerala           12867        India         Punjab             078        India  Mumbai, Thane             189        India  Uttar Pradesh            20910        India         Orissa            69

If need replace only in column location:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
01        India      New Delhi           35412        India     Tamil Nadu            4823        India      Karnataka             034        India  Andra Pradesh            3245        India          Assam           67956        India         Kerala           12867        India         Punjab             078        India  Mumbai, Thane             189        India  Uttar Pradesh            20910        India         Orissa            69

Solution 2:

use str.replace, you need to escape the sequence so it treats it as a carriage return rather than the literal \r:

In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df

Out[15]:
             id290      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

Solution 3:

The below code removes \n tab spaces, \n new line and \r carriage return and is great for condensing datum into one row. The answer was taken from https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

Solution 4:

Somehow, the accepted answer did not work for me. Ultimately, I found the solution by doing it like followed

df["29"] = df["29"].replace(r'\r', '', regex=True)

The difference is that I use \r instead of \\r.

Solution 5:

Just make df equal to the df.replace code line and then print df.

df=df.replace({'\r': ''}, regex=True) 
print(df)

Python Guru