Skip to content Skip to sidebar Skip to footer

Fill Missing Values In Selected Columns With Filtered Values In Other Column

I have a weird column named null in a dataframe that contains some missing values from other columns. One column is lat-lon coordinates named location, the other is an integer repr

Solution 1:

The easiest, if not the simplest approach, is to simply fill all the missing values in df.location and df.level with the values in df.null, then create a boolean filter with regex to return innappropriate/misassigned values in df.location and df.level to np.nan.

pd.fillna()

df = pd.DataFrame(
     {'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
     'location': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
     'level': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}
     }
)

for col in ['location', 'level']:
     df[col].fillna(
          value = stress.null,
          inplace = True
     )

Now we'll use string expressions to correct the mis-assigned values.

str.contains()

# Converting columns to type str so string methods work
df = df.astype(str)

# Using regex to change values that don't belong in column to NaN
regex = '[,]'
df.loc[df.level.str.contains(regex), 'level'] = np.nan
    
regex = '^\d\.?0?$'
df.loc[df.location.str.contains(regex), 'location'] = np.nan
    
# Returning `df.level` to float datatype (str is the correct
# datatype for `df.location`
df.level.astype(float)

Here's the output:

pd.DataFrame(
     {'null': {0: '43.70477575,-72.28844073', 1: '2', 2: '43.70637091,-72.28704334', 3: '4', 4: '3'},
      'location': {0: '43.70477575,-72.28844073', 1: nan, 2: '43.70637091,-72.28704334', 3: nan, 4: nan},
      'level': {0: nan, 1: '2', 2: nan, 3: '4', 4: '3'}
     }
)

Solution 2:

Let us try to_numeric

checker = pd.to_numeric(df.null, errors='coerce')
checker
Out[171]: 
0    NaN
1    2.0
2    NaN
3    4.0
4    3.0
Name: null, dtype: float64

And apply isnull, if return NaN mean that is string not int

isstring = checker.isnull()
Out[172]: 
0     True
1    False
2     True
3    False
4    False
Name: null, dtype: bool
# isnumber = checker.notnull()

Fill value

df.loc[isnumber, 'location'] = df['null']
df.loc[isstring, 'level'] = df['null']

Solution 3:

Another approach might use the method pandas.Series.mask:

>>> df
                       null  location  level
0  43.70477575,-72.28844073       NaN    NaN
1                         2       NaN    NaN
2  43.70637091,-72.28704334       NaN    NaN
3                         4       NaN    NaN
4                         3       NaN    NaN
>>> df.level.mask(df.null.str.isnumeric(), other = df.null, inplace = True)
>>> df.location.where(df.null.str.isnumeric(), other = df.null, inplace = True)
>>>
>>> df
                       null                  location level
0  43.70477575,-72.28844073  43.70477575,-72.28844073   NaN
1                         2                       NaN     2
2  43.70637091,-72.28704334  43.70637091,-72.28704334   NaN
3                         4                       NaN     4
4                         3                       NaN     3

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mask.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html


Post a Comment for "Fill Missing Values In Selected Columns With Filtered Values In Other Column"