How Do I Fill The Previous Day's Not Null Value In My Pandas Dataframe
I want to fill the value of previous days and hour's value in my current day value which is having null value. Consider my dataframe as below:- As, 2021-01-24 15:24:00 is NaN, the
Solution 1:
The approach here is join DF back to itself for previous values. Have provided two example of this
- previous day
- timestamp where it's not
NaN
Have left working columns in place for purpose of transparency.
import io
df = pd.read_csv(io.StringIO(""" creTimestamp CPULoad instnceId
0 2021-01-22 18:48:00 22.0 instanceA
1 2021-01-23 20:25:00 23.0 instanceA
2 2021-01-22 18:42:00 22.0 instanceA
3 2021-01-22 15:24:00 23.0 instanceB
4 2021-01-24 20:25:00 NaN instanceA
5 2021-01-22 08:53:00 22.0 instanceA
6 2021-01-23 19:43:00 23.0 instanceB
7 2021-01-23 15:24:00 NaN instanceA
8 2021-01-24 18:48:00 NaN instanceA
9 2021-01-24 01:51:00 NaN instanceB
10 2021-01-24 15:24:00 NaN instanceA
"""), sep="\t", index_col=0)
df.creTimestamp = df.creTimestamp = pd.to_datetime(df.creTimestamp)
# literally take previous day value
df2 = (df
.assign(yesterday=lambda dfa: dfa.creTimestamp-pd.Timedelta(days=1))
.merge(df.rename(columns={"creTimestamp":"yesterday"}).loc[:,["yesterday","CPULoad"]]
, on="yesterday", suffixes=("", "_pre"), how="left")
.assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))
)
# take timestamp forward, beware if DF has multiple values for same timestamp
df2 = (df
.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
.merge(df.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
.loc[:,["timestamp","CPULoad"]]
.dropna()
, on="timestamp", suffixes=("", "_pre"), how="left")
.assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))
)
output
creTimestampCPULoadinstnceIdtimestampCPULoad_pre2021-01-22 18:48:00 22.0instanceA18:48:0022.02021-01-23 20:25:00 23.0instanceA20:25:0023.02021-01-22 18:42:00 22.0instanceA18:42:0022.02021-01-22 15:24:00 23.0instanceB15:24:0023.02021-01-24 20:25:00 23.0instanceA20:25:0023.02021-01-22 08:53:00 22.0instanceA08:53:0022.02021-01-23 19:43:00 23.0instanceB19:43:0023.02021-01-23 15:24:00 23.0instanceA15:24:0023.02021-01-24 18:48:00 22.0instanceA18:48:0022.02021-01-24 01:51:00 NaNinstanceB01:51:00NaN2021-01-24 15:24:00 23.0instanceA15:24:0023.0
updated
- in large dataframe (not sample), there can be multiple timestamps with different values
- make timestamp unique using
drop_duplicates()
somerge()
will return number of rows in original DF - will mean that NaN is filled with last observed value for a timestamp
- added additional key to join
# take timestamp forward, beware if DF has multiple valuesfor same timestamp
# taking last observed valueto prevent merge generating duplicates
# also include instnceId injoin key...
df2 = (df
.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
.merge(df.assign(timestamp=lambda dfa: dfa.creTimestamp.dt.time)
.loc[:,["instnceId", "timestamp","CPULoad"]]
.dropna()
.drop_duplicates(subset=["instnceId","timestamp"], keep="last")
, on=["instnceId","timestamp"], suffixes=("", "_pre"), how="left")
.assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))
.drop(columns=["timestamp","CPULoad_pre"])
)
Post a Comment for "How Do I Fill The Previous Day's Not Null Value In My Pandas Dataframe"