Skip to content Skip to sidebar Skip to footer

Pivot A Dataframe With Duplicate Values In Index

I have a pandas dataframe like this snapDate instance waitEvent AvgWaitInMs 0 2015-Jul-03 XX gc cr block 3-way 1 1 2015-Jun-2

Solution 1:

You can also use pivot_table:

df.pivot_table(index=['snapDate','instance'], columns='waitEvent', values='AvgWaitInMs')

Out[64]:
waitEvent             gc cr block 3-way  gc current block 3-way  log file sync
snapDate    instance
2015-Jul-01 XX                      NaN                       28
            YY                      NaN                       292015-Jul-03 XX                        128
            YY                      NaN                       192015-Jun-29 XX                      NaN                       28
            YY                      NaN                       28

Data:

I used the following txt file as input (with read_csv from pandas to get the data.frame):

snapDate;instance;waitEvent;AvgWaitInMs
0;2015-Jul-03;XX;gc cr block 3-way;1
1;2015-Jun-29;YY;gc current block 3-way;2
2;2015-Jul-03;YY;gc current block 3-way;1
3;2015-Jun-29;XX;gc current block 3-way;2
4;2015-Jul-01;XX;gc current block 3-way;2
5;2015-Jul-01;YY;gc current block 3-way;2
6;2015-Jul-03;XX;gc current block 3-way;2
7;2015-Jul-03;YY;log file sync;9
8;2015-Jun-29;XX;log file sync;8
9;2015-Jul-03;XX;log file sync;8
10;2015-Jul-01;XX;log file sync;8
11;2015-Jul-01;YY;log file sync;9
12;2015-Jun-29;YY;log file sync;8

Solution 2:

Here is one way to reshape the dataframe to something similar to what you want. Let me know if you have any additional specific requirements on the resulting dataframe.

import pandas as pd

# your data# ====================================print(df)

       snapDate instance               waitEvent  AvgWaitInMs
0                                                            
0   2015-Jul-03       XX       gc cr block 3-way            1
1   2015-Jun-29       YY  gc current block 3-way            2
2   2015-Jul-03       YY  gc current block 3-way            1
3   2015-Jun-29       XX  gc current block 3-way            2
4   2015-Jul-01       XX  gc current block 3-way            2
5   2015-Jul-01       YY  gc current block 3-way            2
6   2015-Jul-03       XX  gc current block 3-way            2
7   2015-Jul-03       YY           log file sync            9
8   2015-Jun-29       XX           log file sync            8
9   2015-Jul-03       XX           log file sync            8
10  2015-Jul-01       XX           log file sync            8
11  2015-Jul-01       YY           log file sync            9
12  2015-Jun-29       YY           log file sync            8

# processing# ====================================
df_temp = df.set_index(['snapDate', 'instance', 'waitEvent']).unstack().fillna(0)

df_temp.columns = df_temp.columns.get_level_values(1).values

df_temp = df_temp.reset_index('instance')

print(df_temp)

            instance  gc cr block 3-way  gc current block 3-way  log file sync
snapDate                                                                      
2015-Jul-01       XX                  0                       2              8
2015-Jul-01       YY                  0                       2              9
2015-Jul-03       XX                  1                       2              8
2015-Jul-03       YY                  0                       1              9
2015-Jun-29       XX                  0                       2              8
2015-Jun-29       YY                  0                       2              8

Post a Comment for "Pivot A Dataframe With Duplicate Values In Index"