Pandas Return Columns In Dataframe That Are Not In Other Dataframe
I have two dataframes that look like this: df_1 = pd.DataFrame({ 'A' : [1.0, 2.0, 3.0, 4.0], 'B' : [100, 200, 300, 400], 'C' : [2, 3, 4, 5] }) df_2 = pd.DataFr
Solution 1:
Pandas index object have set-like properties, so you can directly do:
df_2.columns.difference(df_1.columns)
Index([u'D'], dtype='object')
You can also use operators like &|^
to compute intersection, union and symmetric difference:
df_1.columns & df_2.columns
Index([u'B', u'C'], dtype='object')
df_1.columns | df_2.columns
Index([u'A', u'B', u'C', u'D'], dtype='object')
df_1.columns ^ df_2.columns
Index([u'A', u'D'], dtype='object')
There use to be the -
operator for difference, now deprecated:
df_2.columns - df_1.columns
FutureWarning: using '-' to provide set differences with Indexes is deprecated, use .difference()
Index([u'D'], dtype='object')
Solution 2:
Numpy solution with numpy.setdiff1d
:
a = np.setdiff1d(df_2.columns, df_1.columns)
print (a)
['D']
Pandas solution with Index.difference
:
a = df_2.columns.difference(df_1.columns)
print (a)
Index(['D'], dtype='object')
Another pandas methods are intersection
,
union
and symmetric_difference
:
print (df_2.columns.intersection(df_1.columns))
Index(['B', 'C'], dtype='object')
print (df_2.columns.union(df_1.columns))
Index(['A', 'B', 'C', 'D'], dtype='object')
print (df_2.columns.symmetric_difference(df_1.columns))
Index(['A', 'D'], dtype='object')
And numpy functions are intersect1d
, union1d
and setxor1d
:
print (np.intersect1d(df_2.columns, df_1.columns))
['B''C']
print (np.union1d(df_2.columns, df_1.columns))
['A''B''C''D']
print (np.setxor1d(df_2.columns, df_1.columns))
['A''D']
Solution 3:
You can use:
set(df_2.columns.values) - set(df_1.columns.values)
which returns a set containing column labels of columns in df_2
but not in df_1
.
Solution 4:
here it is buddy
set(df_2.columns).difference(df_1.columns)
Out[76]: {'D'}
Post a Comment for "Pandas Return Columns In Dataframe That Are Not In Other Dataframe"