Strange Behavior When Trying To Append A Row To Each Group In A Group By Object
This question is about a function behaving in an unexpected manner when applied on two different dataframes - more precisely, groupby objects. Either I'm missing something that is
Solution 1:
After a lot debugging problem was noticed.
There is problem with same number in level 3
- in your last sample is shape of group 2
, but this value exist before, so new row was no added onlu row was overwritten.
IDSEQDTMSTATUSIDSEQC15720C1572.02017-05-09 10:13:00.000000 PE1C1572.02017-05-09 12:24:00.000000 OK2NaNNaN2017-07-06 08:46:02.341472 NaN5792C1579.02017-07-06 08:46:02.341472 PE<-ovetwrittenvaluesinrow3C1579.02017-05-09 13:25:00.000000 OK5874C1587.02017-05-09 10:20:00.000000 PE5C1587.02017-05-09 12:25:00.000000 OK2NaNNaN2017-07-06 08:46:02.341472 NaN
First sample was nice because second group has only one row.
But if has 2 rows:
arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)
0firstsecond
bar one0.366258
two 0.583205
two 0.159388
baz one0.598198
two 0.274027
foo one0.086461
two 0.353577
two 0.823377
qux one0.098737
two 0.128470
same problem.
print (a)
firstsecond0 DTM
firstsecond
bar one0 bar one0.366258 NaT
1 NaN NaN NaN 2017-07-0608:47:55.610671
two 1 bar two 0.583205 NaT
2 bar two 0.1593882017-07-0608:47:55.610671<- ovetwritten
baz one3 baz one0.598198 NaT
1 NaN NaN NaN 2017-07-0608:47:55.610671
two 4 baz two 0.274027 NaT
So if function is a bit changed all works perfectly:
now = pd.datetime.now()
def myfunction(g, now):
g.loc[str(g.shape[0]) +'a', 'DTM'] = now
return g
arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)
a = a.reset_index().groupby(['first', 'second']).apply(lambda x: myfunction(x, now))
print (a)
firstsecond0 DTM
firstsecond
bar one0 bar one0.677641 NaT
1a NaN NaN NaN 2017-07-0608:54:47.481671
two 1 bar two 0.274588 NaT
2 bar two 0.524903 NaT
2a NaN NaN NaN 2017-07-0608:54:47.481671
baz one3 baz one0.198272 NaT
1a NaN NaN NaN 2017-07-0608:54:47.481671
two 4 baz two 0.787949 NaT
1a NaN NaN NaN 2017-07-0608:54:47.481671
foo one5 foo one0.484197 NaT
1a NaN NaN NaN 2017-07-0608:54:47.481671
Post a Comment for "Strange Behavior When Trying To Append A Row To Each Group In A Group By Object"