Skip to content Skip to sidebar Skip to footer

Drop The Last Row In A Group, Based On Condition

I want to drop the last row in a group based on a condition. I have done the following: df=pd.read_csv('file') grp = df.groupby('id') for idx, i in grp: df= df[df['column2'].in

Solution 1:

If want remove last in only per groups chain inverted mask with Series.duplicated by ~ with not equal in with

df = df[~df['id'].duplicated() | df['product'].ne('in')]
print (df)
    id product        date
0  220      in  2014-09-01
1  220     out  2014-09-03
3  826      in  2014-11-11
4  826     out  2014-12-09
5  826     out  2014-05-19
6  901      in  2014-09-01
7  901     out  2014-10-05
8  901     out  2014-11-01


If want all possible pairs in-out per groups use this solution, only necessary map non numeric values in-out to numeric by dict, because rolling not working with strings:

#more general solutionprint(df)idproductdate0220out2014-09-031220out2014-09-032220in2014-09-013220out2014-09-034220in2014-10-165826in2014-11-116826in2014-11-117826out2014-12-098826out2014-05-199901in2014-09-0110901out2014-10-0511901in2014-09-0112901out2014-11-01

pat = np.asarray(['in','out'])
N = len(pat)

d = {'in':0, 'out':1}
ma  = (df['product'].map(d)
                   .rolling(window=N , min_periods=N)
                   .apply(lambda x: (x==list(d.values())).all(), raw=False)
                   .mask(lambda x: x == 0) 
                   .reset_index(level=0, drop=True)
df = df[ma]
print (df)
     id product        date
3220     out  2014-09-03
6826in2014-11-117826     out  2014-12-09
10901     out  2014-10-05
12901     out  2014-11-01

Solution 2:

An easy way is to add skipfooter=1 when opening the .csv file:

df = pd.read_csv(file, skipfooter=1, engine='python')

Post a Comment for "Drop The Last Row In A Group, Based On Condition"