How Do I Fill The Previous Day's Not Null Value In My Pandas Dataframe

I want to fill the value of previous days and hour's value in my current day value which is having null value. Consider my dataframe as below:- As, 2021-01-24 15:24:00 is NaN, the

Solution 1:

The approach here is join DF back to itself for previous values. Have provided two example of this

  1. previous day
  2. timestamp where it's not NaN

Have left working columns in place for purpose of transparency.

import io
df = pd.read_csv(io.StringIO("""    creTimestamp    CPULoad instnceId
0   2021-01-22 18:48:00 22.0    instanceA
1   2021-01-23 20:25:00 23.0    instanceA
2   2021-01-22 18:42:00 22.0    instanceA
3   2021-01-22 15:24:00 23.0    instanceB
4   2021-01-24 20:25:00 NaN instanceA
5   2021-01-22 08:53:00 22.0    instanceA
6   2021-01-23 19:43:00 23.0    instanceB
7   2021-01-23 15:24:00 NaN instanceA
8   2021-01-24 18:48:00 NaN instanceA
9   2021-01-24 01:51:00 NaN instanceB
10  2021-01-24 15:24:00 NaN instanceA
"""), sep="\t", index_col=0)

df.creTimestamp = df.creTimestamp = pd.to_datetime(df.creTimestamp)
# literally take previous day value
df2 = (df
 .assign(yesterday=lambda dfa: dfa.creTimestamp-pd.Timedelta(days=1))
        , on="yesterday", suffixes=("", "_pre"), how="left")
 .assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))

# take timestamp forward,  beware if DF has multiple values for same timestamp
df2 = (df
 .assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
 .merge(df.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
        , on="timestamp", suffixes=("", "_pre"), how="left")
 .assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))


creTimestampCPULoadinstnceIdtimestampCPULoad_pre2021-01-22 18:48:00     22.0instanceA18:48:0022.02021-01-23 20:25:00     23.0instanceA20:25:0023.02021-01-22 18:42:00     22.0instanceA18:42:0022.02021-01-22 15:24:00     23.0instanceB15:24:0023.02021-01-24 20:25:00     23.0instanceA20:25:0023.02021-01-22 08:53:00     22.0instanceA08:53:0022.02021-01-23 19:43:00     23.0instanceB19:43:0023.02021-01-23 15:24:00     23.0instanceA15:24:0023.02021-01-24 18:48:00     22.0instanceA18:48:0022.02021-01-24 01:51:00      NaNinstanceB01:51:00NaN2021-01-24 15:24:00     23.0instanceA15:24:0023.0


  • in large dataframe (not sample), there can be multiple timestamps with different values
  • make timestamp unique using drop_duplicates() so merge() will return number of rows in original DF
  • will mean that NaN is filled with last observed value for a timestamp
  • added additional key to join
# take timestamp forward,  beware if DF has multiple valuesfor same timestamp
# taking last observed valueto prevent merge generating duplicates
# also include instnceId injoin key...
df2 = (df
 .assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
 .merge(df.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
        .loc[:,["instnceId", "timestamp","CPULoad"]]
        .drop_duplicates(subset=["instnceId","timestamp"], keep="last")
        , on=["instnceId","timestamp"], suffixes=("", "_pre"), how="left")
 .assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))

