Skip to content Skip to sidebar Skip to footer

Pandas Merging On Different Size Dataframes Based On One Column

I have 2 different sizes of dataframes. On df1, I have date, time, username, email address, phone number, duration from logs. But email address and phone number is just columns wit

Solution 1:

Use merge with left join and parameter suffixes, lastr remove original columns email address and phone number (with _):

df1 = pd.DataFrame({
        'username':list('abccdd'),
         'email address':[''] * 6,
         'phone number':[''] * 6,
         'duration':[5,3,6,9,2,4],
})
print (df1)
  username email address phone number  duration
0        a                                    51        b                                    32        c                                    63        c                                    94        d                                    25        d                                    4

df2 = pd.DataFrame({
        'username':list('abcd'),
         'email address':['a@a.sk','b@a.sk','c@a.sk','d@a.sk'],
         'phone number':range(4)
})
print (df2)
  username email address  phone number
0        a        a@a.sk             01        b        b@a.sk             12        c        c@a.sk             23        d        d@a.sk             3

df = (df1.merge(df2, on='username', how='left', suffixes=('_',''))
        .drop(['email address_','phone number_'], axis=1)
        .reindex(columns=df1.columns))
print (df)
  username email address  phone number  duration
0        a        a@a.sk             051        b        b@a.sk             132        c        c@a.sk             263        c        c@a.sk             294        d        d@a.sk             325        d        d@a.sk             34

Another solution with difference for all columns names without defined in list and reindex for same ordering like in df1 of columns:

c = df1.columns.difference(['email address','phone number'])
df = df1[c].merge(df2, on='username', how='left').reindex(columns=df1.columns)

print (df)
  username email address  phone number  duration
0        a        a@a.sk             051        b        b@a.sk             132        c        c@a.sk             263        c        c@a.sk             294        d        d@a.sk             325        d        d@a.sk             34

Solution 2:

You can use this:

df = df1[['username', 'date', 'time', 'duration']].merge(df2, left_on='username', right_on='username')

Example: df1

   date  duration email address phone number   time username
02015514:00       aa
120161016:00       bb

df2

  email address   phone number username
0rrr@333444       aa
1tt@555533       bb

Output:

  username  date   time  duration email address   phone number
0       aa  201514:005          rrr@         3334441       bb  201616:0010           tt@         555533

Post a Comment for "Pandas Merging On Different Size Dataframes Based On One Column"