Pandas Merging On Different Size Dataframes Based On One Column
I have 2 different sizes of dataframes. On df1, I have date, time, username, email address, phone number, duration from logs. But email address and phone number is just columns wit
Solution 1:
Use merge
with left join and parameter suffixes
, lastr remove original columns email address
and phone number
(with _
):
df1 = pd.DataFrame({
'username':list('abccdd'),
'email address':[''] * 6,
'phone number':[''] * 6,
'duration':[5,3,6,9,2,4],
})
print (df1)
username email address phone number duration
0 a 51 b 32 c 63 c 94 d 25 d 4
df2 = pd.DataFrame({
'username':list('abcd'),
'email address':['a@a.sk','b@a.sk','c@a.sk','d@a.sk'],
'phone number':range(4)
})
print (df2)
username email address phone number
0 a a@a.sk 01 b b@a.sk 12 c c@a.sk 23 d d@a.sk 3
df = (df1.merge(df2, on='username', how='left', suffixes=('_',''))
.drop(['email address_','phone number_'], axis=1)
.reindex(columns=df1.columns))
print (df)
username email address phone number duration
0 a a@a.sk 051 b b@a.sk 132 c c@a.sk 263 c c@a.sk 294 d d@a.sk 325 d d@a.sk 34
Another solution with difference
for all columns names without defined in list and reindex
for same ordering like in df1
of columns:
c = df1.columns.difference(['email address','phone number'])
df = df1[c].merge(df2, on='username', how='left').reindex(columns=df1.columns)
print (df)
username email address phone number duration
0 a a@a.sk 051 b b@a.sk 132 c c@a.sk 263 c c@a.sk 294 d d@a.sk 325 d d@a.sk 34
Solution 2:
You can use this:
df = df1[['username', 'date', 'time', 'duration']].merge(df2, left_on='username', right_on='username')
Example: df1
date duration email address phone number time username
02015514:00 aa
120161016:00 bb
df2
email address phone number username
0rrr@333444 aa
1tt@555533 bb
Output:
username date time duration email address phone number
0 aa 201514:005 rrr@ 3334441 bb 201616:0010 tt@ 555533
Post a Comment for "Pandas Merging On Different Size Dataframes Based On One Column"