I Want To Filter Data For Excel Files Using Pandas
I am trying to filter Data, for Excel Files in Pandas. Based on the Column Value i.e. String Value. I Have tried the following to achieve what I want :- Latest Code shown Below as
Solution 1:
[Updated] - This is kin of weird but it respects the rules you want to apply
(which are a little weird as well, so it makes sense)
1. Create the Dataframe
In [1]:
import pandas as pd
data = [
[475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
[476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
[477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],
[478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],
[484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],
[487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],
[489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
[490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
[504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
[506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
[507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
[508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],
[509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]
]
df = pd.DataFrame(data, columns=['Index', 'Venue', 'A/C', 'DISPLAY', 'Date', 'BID']).set_index('Index')
df
Out [1]:
Venue A/C DISPLAY Date BID
Index
475 SHAWBURY DAK DISPLAY 2008-07-24 00:00:00 188
476 SHAWBURY SPIT DISPLAY 2008-07-24 00:00:00 188
477 COTTESMORE SPIT DISPLAY None 757
478 COTTESMORE DAK DISPLAY None 757
484 SUNDERLAND SPIT DISPLAY None 333
487 EAST FORTUNE SPIT DISPLAY None 406
489 WINDERMERE HS DISPLAY 2008-07-25 00:00:00 138
490 WINDERMERE DAK DISPLAY 2008-07-25 00:00:00 138
504 WIGTON DHS DISPLAY 2008-07-26 00:00:00 144
506 WINDERMERE HS DISPLAY 2008-07-26 00:00:00 138
507 WINDERMERE DAK DISPLAY 2008-07-26 00:00:00 138
508 SUNDERLAND HS DISPLAY None 333
509 SUNDERLAND DAK DISPLAY None 333
2. Manipulate your dataframe
In [2] :
## Keep BID where we have at least 2 rows
test = df.groupby(by=['BID', 'Venue', 'DISPLAY']).count()
test = test[test['A/C']>1]
bids = test.reset_index().BID.tolist()
# Here if there is already `DHS` and `DS` in the column `A/C`, I want to keep them
df.loc[df['A/C']=='DHS', 'Aircraft'] = 'DHS'
df.loc[df['A/C']=='DS', 'Aircraft'] = 'DS'
# I keep 1 row for each bid that has at least 2 rows, and their Aircraft's value are updated
for bid in bids:
df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS'
df.loc[(df['BID']==bid) & (df['A/C']=='SPIT'), 'Aircraft'] = 'DS'
df = df[df['Aircraft'].notnull()].drop(columns=['A/C'], axis=1)
data
Out [2]:
Venue DISPLAY Date BID Aircraft
Index
475 SHAWBURY DISPLAY 2008-07-24 00:00:00 188 DHS
476 SHAWBURY DISPLAY 2008-07-24 00:00:00 188 DS
477 COTTESMORE DISPLAY None 757 DS
478 COTTESMORE DISPLAY None 757 DHS
484 SUNDERLAND DISPLAY None 333 DS
490 WINDERMERE DISPLAY 2008-07-25 00:00:00 138 DHS
504 WIGTON DISPLAY 2008-07-26 00:00:00 144 DHS
507 WINDERMERE DISPLAY 2008-07-26 00:00:00 138 DHS
509 SUNDERLAND DISPLAY None 333 DHS
Post a Comment for "I Want To Filter Data For Excel Files Using Pandas"