Skip to content Skip to sidebar Skip to footer

How To Get Top 5 Reason For Each Airline?

I need the top 5 reasons for each airline only. I managed to get the crosstab for all airlines but it is not sorted and it displayed all the reasons. How can I narrow my results? p

Solution 1:

This is not the best solution but it does the job.

top_n = 5
gb = df.groupby(['airline', 'negativereason']).size().reset_index(name='freq')
df_tops = gb.groupby('airline').apply(lambda x: x.nlargest(top_n, ['freq'])).reset_index(drop=True)

It requires 2 steps. First is to calculate the frequencies for each negativereason per airline, second is to take top_n reasons based on frequency.


Solution 2:

I managed to get the count of each negativereason for each flight. But I still can't get the top 5 results of each airline, sorted from highest to lowest.

count = df.groupby(['airline','negativereason']).size()
print(count) 

>airline         negativereason             
>American        Bad Flight                      87
>                Can't Tell                     198
>                Cancelled Flight               246
>                Customer Service Issue         768
>                Damaged Luggage                 12
>                Flight Attendant Complaints     87
>                Flight Booking Problems        130
>                Late Flight                    249
>                Lost Luggage                   149
>                longlines                       34
>Delta           Bad Flight                      64
>                Can't Tell                     186
>                Cancelled Flight                51
>                Customer Service Issue         199
>                Damaged Luggage                 11
>                Flight Attendant Complaints     60
>                Flight Booking Problems         44
>                Late Flight                    269
>                Lost Luggage                    57
>                longlines                       14

Solution 3:

One approach:

Dataset

,Bad Flight,Cant Tell, Cancelled Flight,Customer Service Issue,Damaged Luggage,Flight Attendant Complaints,Flight Booking Problems,Late Flight,Lost Luggage,Longlines Airline
American,87,198,246,768,12,87,130,249,149,34
Delta,64,186,51,199,11,60,44,269,57,14
Southwest,90,159,162,391,14,38,61,152,90,29
US Airways,104,246,189,811,11,123,122,453,154,50
United,216,379,181,681,22,168,144,525,269,48

Code

import pandas as pd
air = pd.read_csv("airlines.csv", index_col = 0)
print(air)
print(" ")
american5 = air.loc["American"].sort_values(ascending = False).get(range(5))
print(american5)

Output

            Bad Flight  Cant Tell   Cancelled Flight  Customer Service Issue  Damaged Luggage  Flight Attendant Complaints  Flight Booking Problems  Late Flight  Lost Luggage  Longlines Airline
American            87         198                246                     768               12                           87                      130          249           149                 34
Delta               64         186                 51                     199               11                           60                       44          269            57                 14
Southwest           90         159                162                     391               14                           38                       61          152            90                 29
US Airways         104         246                189                     811               11                          123                      122          453           154                 50
United             216         379                181                     681               22                          168                      144          525           269                 48

Customer Service Issue    768
Late Flight               249
Canceled Flight           246
Cant Tell                 198
Lost Luggage              149
Name: American, dtype: int64

Post a Comment for "How To Get Top 5 Reason For Each Airline?"