Merge Data From Multiple Data Frames On Multiple Conditions
I want to merge multiple dataframes, but only if the keys match and the date range falls within 90 days of the 'InitialAdmit' date range in df1. I want to keep all rows from df1 an
Solution 1:
I still recommend merge then filter , here we using Boolean index and combine_first
df=df1.merge(df2,on='Key')
m=(df.InitialAdmit_y>=df.InitialAdmit_x)&(df.InitialAdmit_y<=df.InitialAdmit_x)
df1.set_index('Key').combine_first(df[m].set_index('Key'))
Out[215]:
90DayRange InitialAdmit InitialAdmit_x InitialAdmit_y
Key
100000204 2012-09-02 2012-06-04 NaT NaT
100000255 2012-08-01 2012-05-03 2012-05-03 2012-06-03
100000271 2012-04-15 2012-01-16 NaT NaT
100000286 2013-01-24 2012-10-26 2012-10-26 2012-11-26
100000628 2012-05-21 2012-02-21 NaT NaT
Solution 2:
Consider reduce
for the chain merge using a left join. Below demonstrates with 3 copies of df2. Also, below assumes InitialAdmit is the last column of the dataframe. Reorder as needed.
import pandas
import numpy
from functools import reduce
...
# LIST OF DATAFRAMES WITH SUFFIXING OF INITIALADMIT TO AVOID NAME COLLISION
dfList = [d.rename(columns={'InitialAdmit':'InitialAdmit_' + str(i)})
for i,d in enumerate([df1, df2, df2, df2])]
# USER-DEFINED METHOD CONDITIONING ON LAST COLUMN
def mergefilter(x, y):
tmp = pandas.merge(x, y, on='Key', how='left')
tmp.loc[~(tmp.iloc[:, -1].between(tmp['InitialAdmit_0'], tmp['90DayRange'])),
tmp.columns[-1]] = numpy.nan
return tmp
finaldf = reduce(mergefilter, dfList)
print(finaldf)
# 90DayRange InitialAdmit_0 Key InitialAdmit_1 InitialAdmit_2 InitialAdmit_3
# 0 2012-09-02 2012-06-04 100000204 NaN NaN NaN
# 1 2012-08-01 2012-05-03 100000255 2012-06-03 2012-06-03 2012-06-03
# 2 2012-04-15 2012-01-16 100000271 NaN NaN NaN
# 3 2013-01-24 2012-10-26 100000286 2012-11-26 2012-11-26 2012-11-26
# 4 2012-05-21 2012-02-21 100000628 NaN NaN NaN
Post a Comment for "Merge Data From Multiple Data Frames On Multiple Conditions"