Used Groupby To Select Most Recent Data, Want To Append A Column That Returns The Date Of The Data

May 17, 2024 Post a Comment

I originally had a dataframe that looked like this: industry population %of rural land country date Australia

Solution 1:

I think you need for first year of non NaNs rows create helper Series by dropna and then :

s = df.dropna().reset_index(level=1)['date'].dt.year.groupby(level=0).first()
df1 = df.groupby(level=0).first()
df1.insert(0, 'year', df1.rename(s).index)
#alternative
#df1.insert(0, 'year', df1.index.to_series().map(s))
print (df1)
               year   industry  population
country                                   
Australia      201624.32757118.898304
United States  201520.02727419.028231

Another solution with add NaNs to date column and last get years by dt.year:

df1 = (df.reset_index(level=1)
        .assign(date=lambda x: x['date'].where(df.notnull().all(1).values))
        .groupby(level=0).first()
        .assign(date=lambda x: x['date'].dt.year)
        .rename(columns={'date':'year'}))
print (df1)
               year   industry  population
country                                   
Australia      2016  24.327571   18.898304
United States  2015  20.027274   19.028231

EDIT:

def f(x):
    #check NaNs
    m = x.isnull()
    #remove all NaNs columns 
    m = m.loc[:, ~m.all()]
    #first index value of non NaNs rows
    m = m[~m.any(1)].index[0][1].year
    return (m)

s = df.groupby(level=0).apply(f)
print (s)
country
Australia        2016
United States    2015
dtype: int64

df1 = df.groupby(level=0).first()
df1.insert(0, 'year', df1.rename(s).index)
#alternative
#df1.insert(0, 'year', df1.index.to_series().map(s))
print (df1)
               year   industry  population  %of rural land
country                                                   
Australia      201624.32757118.89830412.0
United States  201520.02727419.028231             NaN

Python Guru

Used Groupby To Select Most Recent Data, Want To Append A Column That Returns The Date Of The Data

Solution 1:

Post a Comment for "Used Groupby To Select Most Recent Data, Want To Append A Column That Returns The Date Of The Data"