Used Groupby To Select Most Recent Data, Want To Append A Column That Returns The Date Of The Data
I originally had a dataframe that looked like this: industry population %of rural land country date Australia
Solution 1:
I think you need for first
year of non NaN
s rows create helper Series
by dropna
and then :
s = df.dropna().reset_index(level=1)['date'].dt.year.groupby(level=0).first()
df1 = df.groupby(level=0).first()
df1.insert(0, 'year', df1.rename(s).index)
#alternative
#df1.insert(0, 'year', df1.index.to_series().map(s))
print (df1)
year industry population
country
Australia 201624.32757118.898304
United States 201520.02727419.028231
Another solution with add NaNs
to date
column and last get years by dt.year
:
df1 = (df.reset_index(level=1)
.assign(date=lambda x: x['date'].where(df.notnull().all(1).values))
.groupby(level=0).first()
.assign(date=lambda x: x['date'].dt.year)
.rename(columns={'date':'year'}))
print (df1)
year industry population
country
Australia 2016 24.327571 18.898304
United States 2015 20.027274 19.028231
EDIT:
def f(x):
#check NaNs
m = x.isnull()
#remove all NaNs columns
m = m.loc[:, ~m.all()]
#first index value of non NaNs rows
m = m[~m.any(1)].index[0][1].year
return (m)
s = df.groupby(level=0).apply(f)
print (s)
country
Australia 2016
United States 2015
dtype: int64
df1 = df.groupby(level=0).first()
df1.insert(0, 'year', df1.rename(s).index)
#alternative
#df1.insert(0, 'year', df1.index.to_series().map(s))
print (df1)
year industry population %of rural land
country
Australia 201624.32757118.89830412.0
United States 201520.02727419.028231 NaN
Post a Comment for "Used Groupby To Select Most Recent Data, Want To Append A Column That Returns The Date Of The Data"