Numpy Unique Could Not Filter Out Groups With The Same Value On A Specific Column
I tried to groupby a df and then select groups who do not have the same value on a specific column and whose group size > 1, df.groupby(['account_no', 'ext_id', 'amount']).filte
Solution 1:
In my opinion there is problem some traling whitespace or similar.
You can check it:
df = pd.DataFrame({'account_no': ['a', 'a', 'a', 'a'],
'ext_id': [2665057, 2665057, 353724, 353724],
'amount': [439.50406200000003, 439.50406200000003, 2758.92, 2758.92],
'int_id': ['D000192', 'D000192', ' 952', '952']})
print (df)
account_no amount ext_id int_id
0 a 439.5040622665057 D000192
1 a 439.5040622665057 D000192
2 a 2758.9200003537249523 a 2758.920000353724952
df1 = df.groupby(['account_no', 'ext_id', 'amount']).filter(lambda x: (len(x) > 1) & (np.unique(x.int_id).size != 1))
print (df1)
account_no amount ext_id int_id
2 a 2758.923537249523 a 2758.92353724952print (df1['int_id'].tolist())
[' 952', '952']
And then remove it by str.strip
:
df['int_id'] = df['int_id'].str.strip()
df1 = df.groupby(['account_no', 'ext_id', 'amount']).filter(lambda x: (len(x) > 1) & (np.unique(x.int_id).size != 1))
print (df1)
Empty DataFrame
Columns: [account_no, amount, ext_id, int_id]
Index: []
Post a Comment for "Numpy Unique Could Not Filter Out Groups With The Same Value On A Specific Column"