Pandas Sort Columns And Find Difference

February 09, 2024 Post a Comment

I have a dataframe and I wanted to sort it so column a == columnb. If there is no match then it puts it into column C My data looks like filenamesLocal FilenamesServer

Solution 1:

Setup

import pandas as pd
from StringIO import StringIO

text="""filenamesLocal          FilenamesServer
  filea.csv                  fileab.csv
  filec.csv                  filea.csv
  fileab.csv                 filec.csv
  filexyz.csv
  fileyh.csv"""

df = pd.read_csv(StringIO(text), delim_whitespace=True)

fnl = df.iloc[:, [0]].set_index(['filenamesLocal'], drop=False).dropna()
fns = df.iloc[:, [1]].set_index(['FilenamesServer'], drop=False).dropna()

print fnl

              filenamesLocal
filenamesLocal               
filea.csv           filea.csv
filec.csv           filec.csv
fileab.csv         fileab.csv
filexyz.csv       filexyz.csv
fileyh.csv         fileyh.csv

print fns

                FilenamesServer
FilenamesServer                
fileab.csv           fileab.csv
filea.csv             filea.csv
filec.csv             filec.csv

Align fnl and fns

aligned = pd.concat([fnl, fns], axis=1)

print aligned

            filenamesLocal FilenamesServer
filea.csv        filea.csv       filea.csv
fileab.csv      fileab.csv      fileab.csv
filec.csv        filec.csv       filec.csv
filexyz.csv    filexyz.csv             NaN
fileyh.csv      fileyh.csv             NaN

master = aligned.filenamesLocal.combine_first(aligned.FilenamesServer)

print master

filea.csv        filea.csv
fileab.csv      fileab.csv
filec.csv        filec.csv
filexyz.csv    filexyz.csv
fileyh.csv      fileyh.csv
Name: filenamesLocal, dtype: object

assign difference

aligned['Difference'] = master[aligned.isnull().any(axis=1)]

print aligned

            filenamesLocal FilenamesServer   Difference
filea.csv        filea.csv       filea.csv    filea.csv
fileab.csv      fileab.csv      fileab.csv   fileab.csv
filec.csv        filec.csv       filec.csv    filec.csv
filexyz.csv    filexyz.csv             NaN  filexyz.csv
fileyh.csv      fileyh.csv             NaN   fileyh.csv

Baca Juga

Learn Python Tutorials

Pandas Sort Columns And Find Difference

Solution 1:

Setup

Post a Comment for "Pandas Sort Columns And Find Difference"