Add Data For The Missing Dates Based On Previous Hour Data In Pandas
I have a dataframe like below :- Missing dates date is for below: 2021-01-23 2021-01-25 I want to fill the rows for 2021-01-23 and 2021-01-25 also with the previous dates. Example
Solution 1:
This is a continuation of fill values
- generate a DF that is combination of sampled hours and instances (
df2
) - this generates 15 rows as there are 3 times for instanceA and 2 times for instanceB across 3 dates (2+3)*3
- then use same technique to fill both CPULoad and synthesized memload
- tested against pandas 1.0.1 as well as 1.2.0
import pandas as pd
import io
import datetime as dt
import numpy as np
df = pd.read_csv(io.StringIO("""id creTimestamp CPULoad instnceId
0 2021-01-22 18:00:00 22.0 instanceA
1 2021-01-22 19:00:00 22.0 instanceA
2 2021-01-22 20:00:00 23.0 instanceB
3 2021-01-23 18:00:00 24.0 instanceA
4 2021-01-23 20:00:00 22.0 instanceA
5 2021-01-24 18:00:00 23.0 instanceB
6 2021-01-24 20:00:00 23.5 instanceA
"""), sep="\t", index_col=0)
df.creTimestamp = pd.to_datetime(df.creTimestamp)
df["memload"] = np.random.random(len(df))
# generate a DF for each time in instance in each date
df2 = (pd.merge(
# for each time in instance
df.assign(timestamp=df.creTimestamp.dt.time)
.loc[:,["instnceId","timestamp"]]
.drop_duplicates()
.assign(foo=1),
# for each date
df.creTimestamp.dt.date.drop_duplicates().to_frame().assign(foo=1),
on="foo"
).assign(creTimestamp=lambda dfa: dfa.apply(lambda r: dt.datetime.combine(r["creTimestamp"], r["timestamp"]), axis=1))
.drop(columns="foo")
# merge values back..
.merge(df, on=["creTimestamp", "instnceId"], how="left")
)
# now get values to fill NaN
df2 = (df2.merge(df2.dropna().drop_duplicates(subset=["instnceId","timestamp"], keep="last"),
on=["timestamp","instnceId"], suffixes=("","_pre"))
.assign(CPULoad=lambda dfa: dfa.CPULoad.fillna(dfa.CPULoad_pre))
.assign(memload=lambda dfa: dfa.memload.fillna(dfa.memload_pre))
)
output
instnceId timestamp creTimestamp CPULoad creTimestamp_pre CPULoad_pre
0 instanceA 18:00:00 2021-01-22 18:00:00 22.0 2021-01-23 18:00:00 24.0
1 instanceA 18:00:00 2021-01-23 18:00:00 24.0 2021-01-23 18:00:00 24.0
2 instanceA 18:00:00 2021-01-24 18:00:00 24.0 2021-01-23 18:00:00 24.0
3 instanceA 19:00:00 2021-01-22 19:00:00 22.0 2021-01-22 19:00:00 22.0
4 instanceA 19:00:00 2021-01-23 19:00:00 22.0 2021-01-22 19:00:00 22.0
5 instanceA 19:00:00 2021-01-24 19:00:00 22.0 2021-01-22 19:00:00 22.0
6 instanceB 20:00:00 2021-01-22 20:00:00 23.0 2021-01-22 20:00:00 23.0
7 instanceB 20:00:00 2021-01-23 20:00:00 23.0 2021-01-22 20:00:00 23.0
8 instanceB 20:00:00 2021-01-24 20:00:00 23.0 2021-01-22 20:00:00 23.0
9 instanceA 20:00:00 2021-01-22 20:00:00 23.5 2021-01-24 20:00:00 23.5
10 instanceA 20:00:00 2021-01-23 20:00:00 22.0 2021-01-24 20:00:00 23.5
11 instanceA 20:00:00 2021-01-24 20:00:00 23.5 2021-01-24 20:00:00 23.5
12 instanceB 18:00:00 2021-01-22 18:00:00 23.0 2021-01-24 18:00:00 23.0
13 instanceB 18:00:00 2021-01-23 18:00:00 23.0 2021-01-24 18:00:00 23.0
14 instanceB 18:00:00 2021-01-24 18:00:00 23.0 2021-01-24 18:00:00 23.0
Post a Comment for "Add Data For The Missing Dates Based On Previous Hour Data In Pandas"