Merging Two Pandas Dataframes Results In "duplicate" Columns
Solution 1:
The reason you have additional columns with suffixes '_x' and '_y' is because the columns you are merging do not have matching values so this clash produces additional columns. In that case you need to drop the additional '_y' columns and rename the '_x' columns:
In [145]:
# define our drop functiondefdrop_y(df):
# list comprehension of the cols that end with '_y'
to_drop = [x for x in df if x.endswith('_y')]
df.drop(to_drop, axis=1, inplace=True)
drop_y(merged)
merged
Out[145]:
key dept_name_x res_name_x year_x need holding \
0 DeptA_ResA_2015 DeptA ResA 2015111 DeptA_ResA_2016 DeptA ResA 2016112 DeptA_ResA_2017 DeptA ResA 201711
no_of_inv inv_cost_wo_ice
011000000100200
In [146]:
# func to rename '_x' colsdefrename_x(df):
for col in df:
if col.endswith('_x'):
df.rename(columns={col:col.rstrip('_x')}, inplace=True)
rename_x(merged)
merged
Out[146]:
key dept_name res_name year need holding no_of_inv \
0 DeptA_ResA_2015 DeptA ResA 20151111 DeptA_ResA_2016 DeptA ResA 20161102 DeptA_ResA_2017 DeptA ResA 2017110
inv_cost_wo_ice
010000001020
EDIT If you added the common columns to your merge then it shouldn't produce the duplicated columns unless the matches on those columns do not match:
merge_df = pd.merge(holding_df, invest_df, on=['key', 'dept_name', 'res_name', 'year'], how='left').fillna(0)
Solution 2:
I have the same problem with duplicate columns after left joins even when the columns' data is identical. I did a query and found out that NaN values are considered different even if both columns are NaN in pandas 0.14. BUT once you upgrade to 0.15, this problem disappears, which explains why it later works for you, you probably upgraded.
Solution 3:
Not exactly the answer, but pd.merge
provides an argument to help you decide which suffixes should be added to your overlapping columns:
merge_df = pd.merge(holding_df, invest_df, on='key', how='left', suffixes=('_holding', '_invest')).fillna(0)
More meaningful names could be helpful if you decide to keep both (or to check why the columns are kept).
See documentation for more reference.
Post a Comment for "Merging Two Pandas Dataframes Results In "duplicate" Columns"