Strange Behavior When Trying To Append A Row To Each Group In A Group By Object
This question is about a function behaving in an unexpected manner when applied on two different dataframes - more precisely, groupby objects. Either I'm missing something that is
Solution 1:
After a lot debugging problem was noticed.
There is problem with same number in level 3
- in your last sample is shape of group 2
, but this value exist before, so new row was no added onlu row was overwritten.
ID SEQ DTM STATUS
ID SEQ
C1 572 0 C1 572.0 2017-05-09 10:13:00.000000 PE
1 C1 572.0 2017-05-09 12:24:00.000000 OK
2 NaN NaN 2017-07-06 08:46:02.341472 NaN
579 2 C1 579.0 2017-07-06 08:46:02.341472 PE <- ovetwritten values in row
3 C1 579.0 2017-05-09 13:25:00.000000 OK
587 4 C1 587.0 2017-05-09 10:20:00.000000 PE
5 C1 587.0 2017-05-09 12:25:00.000000 OK
2 NaN NaN 2017-07-06 08:46:02.341472 NaN
First sample was nice because second group has only one row.
But if has 2 rows:
arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)
0
first second
bar one 0.366258
two 0.583205
two 0.159388
baz one 0.598198
two 0.274027
foo one 0.086461
two 0.353577
two 0.823377
qux one 0.098737
two 0.128470
same problem.
print (a)
first second 0 DTM
first second
bar one 0 bar one 0.366258 NaT
1 NaN NaN NaN 2017-07-06 08:47:55.610671
two 1 bar two 0.583205 NaT
2 bar two 0.159388 2017-07-06 08:47:55.610671 <- ovetwritten
baz one 3 baz one 0.598198 NaT
1 NaN NaN NaN 2017-07-06 08:47:55.610671
two 4 baz two 0.274027 NaT
So if function is a bit changed all works perfectly:
now = pd.datetime.now()
def myfunction(g, now):
g.loc[str(g.shape[0]) + 'a', 'DTM'] = now
return g
arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)
a = a.reset_index().groupby(['first', 'second']).apply(lambda x: myfunction(x, now))
print (a)
first second 0 DTM
first second
bar one 0 bar one 0.677641 NaT
1a NaN NaN NaN 2017-07-06 08:54:47.481671
two 1 bar two 0.274588 NaT
2 bar two 0.524903 NaT
2a NaN NaN NaN 2017-07-06 08:54:47.481671
baz one 3 baz one 0.198272 NaT
1a NaN NaN NaN 2017-07-06 08:54:47.481671
two 4 baz two 0.787949 NaT
1a NaN NaN NaN 2017-07-06 08:54:47.481671
foo one 5 foo one 0.484197 NaT
1a NaN NaN NaN 2017-07-06 08:54:47.481671
Post a Comment for "Strange Behavior When Trying To Append A Row To Each Group In A Group By Object"