Skip to content Skip to sidebar Skip to footer

Merge Data From Multiple Data Frames On Multiple Conditions

I want to merge multiple dataframes, but only if the keys match and the date range falls within 90 days of the 'InitialAdmit' date range in df1. I want to keep all rows from df1 an

Solution 1:

I still recommend merge then filter , here we using Boolean index and combine_first

df=df1.merge(df2,on='Key')
m=(df.InitialAdmit_y>=df.InitialAdmit_x)&(df.InitialAdmit_y<=df.InitialAdmit_x)
df1.set_index('Key').combine_first(df[m].set_index('Key'))


Out[215]: 
          90DayRange InitialAdmit InitialAdmit_x InitialAdmit_y
Key                                                            
100000204 2012-09-02   2012-06-04            NaT            NaT
100000255 2012-08-01   2012-05-03     2012-05-03     2012-06-03
100000271 2012-04-15   2012-01-16            NaT            NaT
100000286 2013-01-24   2012-10-26     2012-10-26     2012-11-26
100000628 2012-05-21   2012-02-21            NaT            NaT

Solution 2:

Consider reduce for the chain merge using a left join. Below demonstrates with 3 copies of df2. Also, below assumes InitialAdmit is the last column of the dataframe. Reorder as needed.

import pandas 
import numpy
from functools import reduce    
...

# LIST OF DATAFRAMES WITH SUFFIXING OF INITIALADMIT TO AVOID NAME COLLISION
dfList = [d.rename(columns={'InitialAdmit':'InitialAdmit_' + str(i)}) 
          for i,d  in enumerate([df1, df2, df2, df2])]

# USER-DEFINED METHOD CONDITIONING ON LAST COLUMN
def mergefilter(x, y):
    tmp = pandas.merge(x, y, on='Key', how='left')
    tmp.loc[~(tmp.iloc[:, -1].between(tmp['InitialAdmit_0'], tmp['90DayRange'])), 
            tmp.columns[-1]] = numpy.nan

    return tmp

finaldf = reduce(mergefilter, dfList)

print(finaldf)
#    90DayRange InitialAdmit_0        Key InitialAdmit_1 InitialAdmit_2 InitialAdmit_3
# 0  2012-09-02     2012-06-04  100000204            NaN            NaN            NaN
# 1  2012-08-01     2012-05-03  100000255     2012-06-03     2012-06-03     2012-06-03
# 2  2012-04-15     2012-01-16  100000271            NaN            NaN            NaN
# 3  2013-01-24     2012-10-26  100000286     2012-11-26     2012-11-26     2012-11-26
# 4  2012-05-21     2012-02-21  100000628            NaN            NaN            NaN

Post a Comment for "Merge Data From Multiple Data Frames On Multiple Conditions"