Skip to content Skip to sidebar Skip to footer

Strange Behavior When Trying To Append A Row To Each Group In A Group By Object

This question is about a function behaving in an unexpected manner when applied on two different dataframes - more precisely, groupby objects. Either I'm missing something that is

Solution 1:

After a lot debugging problem was noticed.

There is problem with same number in level 3 - in your last sample is shape of group 2, but this value exist before, so new row was no added onlu row was overwritten.

IDSEQDTMSTATUSIDSEQC15720C1572.02017-05-09 10:13:00.000000     PE1C1572.02017-05-09 12:24:00.000000     OK2NaNNaN2017-07-06 08:46:02.341472    NaN5792C1579.02017-07-06 08:46:02.341472     PE<-ovetwrittenvaluesinrow3C1579.02017-05-09 13:25:00.000000     OK5874C1587.02017-05-09 10:20:00.000000     PE5C1587.02017-05-09 12:25:00.000000     OK2NaNNaN2017-07-06 08:46:02.341472    NaN

First sample was nice because second group has only one row.

But if has 2 rows:

arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
             ['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)
                     0firstsecond          
bar   one0.366258
      two     0.583205
      two     0.159388
baz   one0.598198
      two     0.274027
foo   one0.086461
      two     0.353577
      two     0.823377
qux   one0.098737
      two     0.128470

same problem.

print (a)
               firstsecond0                        DTM
firstsecond                                                    
bar   one0   bar    one0.366258                        NaT
             1   NaN    NaN       NaN 2017-07-0608:47:55.610671
      two    1   bar    two  0.583205                        NaT
             2   bar    two  0.1593882017-07-0608:47:55.610671<- ovetwritten 
baz   one3   baz    one0.598198                        NaT
             1   NaN    NaN       NaN 2017-07-0608:47:55.610671
      two    4   baz    two  0.274027                        NaT

So if function is a bit changed all works perfectly:

now = pd.datetime.now()

def myfunction(g, now):

    g.loc[str(g.shape[0]) +'a', 'DTM'] = now 
    return g

arrays = [['bar', 'bar','bar', 'baz', 'baz', 'foo', 'foo', 'foo', 'qux', 'qux'],
             ['one', 'two','two', 'one', 'two', 'one', 'two', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
a = pd.DataFrame(np.random.random((10,)), index = index)
print (a)


a = a.reset_index().groupby(['first', 'second']).apply(lambda x: myfunction(x, now))
print (a)
                firstsecond0                        DTM
firstsecond                                                     
bar   one0    bar    one0.677641                        NaT
             1a   NaN    NaN       NaN 2017-07-0608:54:47.481671
      two    1    bar    two  0.274588                        NaT
             2    bar    two  0.524903                        NaT
             2a   NaN    NaN       NaN 2017-07-0608:54:47.481671
baz   one3    baz    one0.198272                        NaT
             1a   NaN    NaN       NaN 2017-07-0608:54:47.481671
      two    4    baz    two  0.787949                        NaT
             1a   NaN    NaN       NaN 2017-07-0608:54:47.481671
foo   one5    foo    one0.484197                        NaT
             1a   NaN    NaN       NaN 2017-07-0608:54:47.481671

Post a Comment for "Strange Behavior When Trying To Append A Row To Each Group In A Group By Object"