Skip to content Skip to sidebar Skip to footer

I Want To Filter Data For Excel Files Using Pandas

I am trying to filter Data, for Excel Files in Pandas. Based on the Column Value i.e. String Value. I Have tried the following to achieve what I want :- Latest Code shown Below as

Solution 1:

[Updated] - This is kin of weird but it respects the rules you want to apply

(which are a little weird as well, so it makes sense)

1. Create the Dataframe

In [1]:
import pandas as pd
 
data = [
        [475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],                
        [478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],               
        [484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],           
        [487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],             
        [489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
        [506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],                
        [509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]
       ]
df = pd.DataFrame(data, columns=['Index', 'Venue', 'A/C', 'DISPLAY', 'Date', 'BID']).set_index('Index')
df

Out [1]:

       Venue        A/C     DISPLAY     Date                    BID
Index                   
475    SHAWBURY     DAK     DISPLAY     2008-07-24 00:00:00     188
476    SHAWBURY     SPIT    DISPLAY     2008-07-24 00:00:00     188
477    COTTESMORE   SPIT    DISPLAY     None                    757
478    COTTESMORE   DAK     DISPLAY     None                    757
484    SUNDERLAND   SPIT    DISPLAY     None                    333
487    EAST FORTUNE SPIT    DISPLAY     None                    406
489    WINDERMERE   HS      DISPLAY     2008-07-25 00:00:00     138
490    WINDERMERE   DAK     DISPLAY     2008-07-25 00:00:00     138
504    WIGTON       DHS     DISPLAY     2008-07-26 00:00:00     144
506    WINDERMERE   HS      DISPLAY     2008-07-26 00:00:00     138
507    WINDERMERE   DAK     DISPLAY     2008-07-26 00:00:00     138
508    SUNDERLAND   HS      DISPLAY     None                    333
509    SUNDERLAND   DAK     DISPLAY     None                    333

2. Manipulate your dataframe

In [2] :
## Keep BID where we have at least 2 rows
test = df.groupby(by=['BID', 'Venue', 'DISPLAY']).count()
test = test[test['A/C']>1]
bids = test.reset_index().BID.tolist()

# Here if there is already `DHS` and `DS` in the column `A/C`, I want to keep them
df.loc[df['A/C']=='DHS', 'Aircraft'] = 'DHS'
df.loc[df['A/C']=='DS', 'Aircraft'] = 'DS'

# I keep 1 row for each bid that has at least 2 rows, and their Aircraft's value are updated
for bid in bids:
    df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS' 
    df.loc[(df['BID']==bid) & (df['A/C']=='SPIT'), 'Aircraft'] = 'DS' 
    

df = df[df['Aircraft'].notnull()].drop(columns=['A/C'], axis=1)
data

Out [2]:

        Venue       DISPLAY     Date                BID     Aircraft
Index                   
475     SHAWBURY    DISPLAY     2008-07-24 00:00:00 188     DHS
476     SHAWBURY    DISPLAY     2008-07-24 00:00:00 188     DS
477     COTTESMORE  DISPLAY     None                757     DS
478     COTTESMORE  DISPLAY     None                757     DHS
484     SUNDERLAND  DISPLAY     None                333     DS
490     WINDERMERE  DISPLAY     2008-07-25 00:00:00 138     DHS
504     WIGTON      DISPLAY     2008-07-26 00:00:00 144     DHS
507     WINDERMERE  DISPLAY     2008-07-26 00:00:00 138     DHS
509     SUNDERLAND  DISPLAY     None                333     DHS

Post a Comment for "I Want To Filter Data For Excel Files Using Pandas"