Skip to content Skip to sidebar Skip to footer

Numpy Unique Could Not Filter Out Groups With The Same Value On A Specific Column

I tried to groupby a df and then select groups who do not have the same value on a specific column and whose group size > 1, df.groupby(['account_no', 'ext_id', 'amount']).filte

Solution 1:

In my opinion there is problem some traling whitespace or similar.

You can check it:

df = pd.DataFrame({'account_no': ['a', 'a', 'a', 'a'], 
                   'ext_id': [2665057, 2665057, 353724, 353724], 
                   'amount': [439.50406200000003, 439.50406200000003, 2758.92, 2758.92], 
                   'int_id': ['D000192', 'D000192', ' 952', '952']})
print (df)
  account_no       amount   ext_id   int_id
0          a   439.5040622665057  D000192
1          a   439.5040622665057  D000192
2          a  2758.9200003537249523          a  2758.920000353724952

df1 = df.groupby(['account_no', 'ext_id', 'amount']).filter(lambda x: (len(x) > 1) & (np.unique(x.int_id).size != 1))
print (df1)
  account_no   amount  ext_id int_id
2          a  2758.923537249523          a  2758.92353724952print (df1['int_id'].tolist())
[' 952', '952']

And then remove it by str.strip:

df['int_id'] = df['int_id'].str.strip()
df1 = df.groupby(['account_no', 'ext_id', 'amount']).filter(lambda x: (len(x) > 1) & (np.unique(x.int_id).size != 1))
print (df1)
Empty DataFrame
Columns: [account_no, amount, ext_id, int_id]
Index: []

Post a Comment for "Numpy Unique Could Not Filter Out Groups With The Same Value On A Specific Column"