Pivot A Dataframe With Duplicate Values In Index
I have a pandas dataframe like this snapDate instance waitEvent AvgWaitInMs 0 2015-Jul-03 XX gc cr block 3-way 1 1 2015-Jun-2
Solution 1:
You can also use pivot_table
:
df.pivot_table(index=['snapDate','instance'], columns='waitEvent', values='AvgWaitInMs')
Out[64]:
waitEvent gc cr block 3-way gc current block 3-way log file sync
snapDate instance
2015-Jul-01 XX NaN 28
YY NaN 292015-Jul-03 XX 128
YY NaN 192015-Jun-29 XX NaN 28
YY NaN 28
Data:
I used the following txt file as input (with read_csv
from pandas
to get the data.frame):
snapDate;instance;waitEvent;AvgWaitInMs
0;2015-Jul-03;XX;gc cr block 3-way;1
1;2015-Jun-29;YY;gc current block 3-way;2
2;2015-Jul-03;YY;gc current block 3-way;1
3;2015-Jun-29;XX;gc current block 3-way;2
4;2015-Jul-01;XX;gc current block 3-way;2
5;2015-Jul-01;YY;gc current block 3-way;2
6;2015-Jul-03;XX;gc current block 3-way;2
7;2015-Jul-03;YY;log file sync;9
8;2015-Jun-29;XX;log file sync;8
9;2015-Jul-03;XX;log file sync;8
10;2015-Jul-01;XX;log file sync;8
11;2015-Jul-01;YY;log file sync;9
12;2015-Jun-29;YY;log file sync;8
Solution 2:
Here is one way to reshape the dataframe to something similar to what you want. Let me know if you have any additional specific requirements on the resulting dataframe.
import pandas as pd
# your data# ====================================print(df)
snapDate instance waitEvent AvgWaitInMs
0
0 2015-Jul-03 XX gc cr block 3-way 1
1 2015-Jun-29 YY gc current block 3-way 2
2 2015-Jul-03 YY gc current block 3-way 1
3 2015-Jun-29 XX gc current block 3-way 2
4 2015-Jul-01 XX gc current block 3-way 2
5 2015-Jul-01 YY gc current block 3-way 2
6 2015-Jul-03 XX gc current block 3-way 2
7 2015-Jul-03 YY log file sync 9
8 2015-Jun-29 XX log file sync 8
9 2015-Jul-03 XX log file sync 8
10 2015-Jul-01 XX log file sync 8
11 2015-Jul-01 YY log file sync 9
12 2015-Jun-29 YY log file sync 8
# processing# ====================================
df_temp = df.set_index(['snapDate', 'instance', 'waitEvent']).unstack().fillna(0)
df_temp.columns = df_temp.columns.get_level_values(1).values
df_temp = df_temp.reset_index('instance')
print(df_temp)
instance gc cr block 3-way gc current block 3-way log file sync
snapDate
2015-Jul-01 XX 0 2 8
2015-Jul-01 YY 0 2 9
2015-Jul-03 XX 1 2 8
2015-Jul-03 YY 0 1 9
2015-Jun-29 XX 0 2 8
2015-Jun-29 YY 0 2 8
Post a Comment for "Pivot A Dataframe With Duplicate Values In Index"