Skip to content Skip to sidebar Skip to footer

How To Merge/combine Columns In Pandas?

I have a (example-) dataframe with 4 columns: data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'], 'B': [42, 52, np.nan, np.nan, np.nan, np.nan], 'C': [np.nan, np.nan, 31, 2, np.

Solution 1:

Option 1 Using assign and drop

In [644]: cols = ['B', 'C', 'D']

In [645]: df.assign(E=df[cols].sum(1)).drop(cols, 1)
Out[645]:
   A     E
0  a  42.01  b  52.02  c  31.03  d   2.04  e  62.05  f  70.0

Option 2 Using assignment and drop

In [648]: df['E'] = df[cols].sum(1)

In [649]: df = df.drop(cols, 1)

In [650]: df
Out[650]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

Option 3 Lately, I like the 3rd option. Using groupby

In [660]: df.groupby(np.where(df.columns =='A', 'A', 'E'), axis=1).first() #or sum max min
Out[660]:
   A     E
0  a  42.01  b  52.02  c  31.03  d   2.04  e  62.05  f  70.0In [661]: df.columns =='A'Out[661]: array([ True, False, False, False], dtype=bool)

In [662]: np.where(df.columns =='A', 'A', 'E')
Out[662]:
array(['A', 'E', 'E', 'E'],
      dtype='|S1')

Solution 2:

Use difference for columns names without A and then get sum or max:

cols = df.columns.difference(['A'])
df['E'] = df[cols].sum(axis=1).astype(int)
# df['E'] = df[cols].max(axis=1).astype(int)df = df.drop(cols, axis=1)
print (df)
   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

If multiple values per rows:

data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
    'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
    'D': [10, np.nan, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

print (df)
   A     B     C     D
0  a  42.0   NaN  10.0
1  b  52.0   NaN   NaN
2  c   NaN  31.0   NaN
3  d   NaN   2.0   NaN
4  e   NaN   NaN  62.0
5  f   NaN   NaN  70.0

cols = df.columns.difference(['A'])
df['E'] = df[cols].apply(lambda x: ', '.join(x.dropna().astype(int).astype(str)), 1)
df = df.drop(cols, axis=1)
print (df)
   A       E
0  a  42, 10
1  b      52
2  c      31
3  d       2
4  e      62
5  f      70

Solution 3:

You can also use ffill with iloc:

df['E'] = df.iloc[:, 1:].ffill(1).iloc[:, -1].astype(int)
df = df.iloc[:, [0, -1]]

print(df)

   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

Solution 4:

Zero's third option using groupby requires a numpy import and only handles one column outside the set of columns to collapse, while jpp's answer using ffill requires you know how columns are ordered. Here's a solution that has no extra dependencies, takes an arbitrary input dataframe, and only collapses columns if all rows in those columns are single-valued:

import pandas as pd

data = [{'A':'a', 'B':42, 'messy':'z'},
    {'A':'b', 'B':52, 'messy':'y'},
    {'A':'c', 'C':31},
    {'A':'d', 'C':2, 'messy':'w'},
    {'A':'e', 'D':62, 'messy':'v'},
    {'A':'f', 'D':70, 'messy':['z']}]
df = pd.DataFrame(data)

cols = ['B', 'C', 'D']
new_col = 'E'if df[cols].apply(lambda x: len(x.notna().value_counts()) == 1, axis=1).all():
    df[new_col] = df[cols].ffill(axis=1).dropna(axis=1)

df2 = df.drop(columns=cols)

print(df, '\n\n', df2)

Output:

   A     B messy     C     D
0  a  42.0     z   NaN   NaN
1  b  52.0     y   NaN   NaN
2  c   NaN   NaN  31.0   NaN
3  d   NaN     w   2.0   NaN
4  e   NaN     v   NaN  62.05  f   NaN   [z]   NaN  70.0

   A messy     E
0  a     z  42.01  b     y  52.02  c   NaN  31.03  d     w   2.04  e     v  62.05  f   [z]  70.0

Post a Comment for "How To Merge/combine Columns In Pandas?"