Skip to content Skip to sidebar Skip to footer

Pandas Sort Columns And Find Difference

I have a dataframe and I wanted to sort it so column a == columnb. If there is no match then it puts it into column C My data looks like filenamesLocal FilenamesServer

Solution 1:

Setup

import pandas as pd
from StringIO import StringIO

text="""filenamesLocal          FilenamesServer
  filea.csv                  fileab.csv
  filec.csv                  filea.csv
  fileab.csv                 filec.csv
  filexyz.csv
  fileyh.csv"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

fnl = df.iloc[:, [0]].set_index(['filenamesLocal'], drop=False).dropna()
fns = df.iloc[:, [1]].set_index(['FilenamesServer'], drop=False).dropna()

print fnl

              filenamesLocal
filenamesLocal               
filea.csv           filea.csv
filec.csv           filec.csv
fileab.csv         fileab.csv
filexyz.csv       filexyz.csv
fileyh.csv         fileyh.csv

print fns

                FilenamesServer
FilenamesServer                
fileab.csv           fileab.csv
filea.csv             filea.csv
filec.csv             filec.csv

Align fnl and fns

aligned = pd.concat([fnl, fns], axis=1)

print aligned

            filenamesLocal FilenamesServer
filea.csv        filea.csv       filea.csv
fileab.csv      fileab.csv      fileab.csv
filec.csv        filec.csv       filec.csv
filexyz.csv    filexyz.csv             NaN
fileyh.csv      fileyh.csv             NaN

master = aligned.filenamesLocal.combine_first(aligned.FilenamesServer)

print master

filea.csv        filea.csv
fileab.csv      fileab.csv
filec.csv        filec.csv
filexyz.csv    filexyz.csv
fileyh.csv      fileyh.csv
Name: filenamesLocal, dtype: object

assign difference

aligned['Difference'] = master[aligned.isnull().any(axis=1)]

print aligned

            filenamesLocal FilenamesServer   Difference
filea.csv        filea.csv       filea.csv    filea.csv
fileab.csv      fileab.csv      fileab.csv   fileab.csv
filec.csv        filec.csv       filec.csv    filec.csv
filexyz.csv    filexyz.csv             NaN  filexyz.csv
fileyh.csv      fileyh.csv             NaN   fileyh.csv

Post a Comment for "Pandas Sort Columns And Find Difference"