Pandas Sort Columns And Find Difference
I have a dataframe and I wanted to sort it so column a == columnb. If there is no match then it puts it into column C My data looks like filenamesLocal FilenamesServer
Solution 1:
Setup
import pandas as pd
from StringIO import StringIO
text="""filenamesLocal FilenamesServer
filea.csv fileab.csv
filec.csv filea.csv
fileab.csv filec.csv
filexyz.csv
fileyh.csv"""
df = pd.read_csv(StringIO(text), delim_whitespace=True)
fnl = df.iloc[:, [0]].set_index(['filenamesLocal'], drop=False).dropna()
fns = df.iloc[:, [1]].set_index(['FilenamesServer'], drop=False).dropna()
print fnl
filenamesLocal
filenamesLocal
filea.csv filea.csv
filec.csv filec.csv
fileab.csv fileab.csv
filexyz.csv filexyz.csv
fileyh.csv fileyh.csv
print fns
FilenamesServer
FilenamesServer
fileab.csv fileab.csv
filea.csv filea.csv
filec.csv filec.csv
Align fnl
and fns
aligned = pd.concat([fnl, fns], axis=1)
print aligned
filenamesLocal FilenamesServer
filea.csv filea.csv filea.csv
fileab.csv fileab.csv fileab.csv
filec.csv filec.csv filec.csv
filexyz.csv filexyz.csv NaN
fileyh.csv fileyh.csv NaN
master = aligned.filenamesLocal.combine_first(aligned.FilenamesServer)
print master
filea.csv filea.csv
fileab.csv fileab.csv
filec.csv filec.csv
filexyz.csv filexyz.csv
fileyh.csv fileyh.csv
Name: filenamesLocal, dtype: object
assign difference
aligned['Difference'] = master[aligned.isnull().any(axis=1)]
print aligned
filenamesLocal FilenamesServer Difference
filea.csv filea.csv filea.csv filea.csv
fileab.csv fileab.csv fileab.csv fileab.csv
filec.csv filec.csv filec.csv filec.csv
filexyz.csv filexyz.csv NaN filexyz.csv
fileyh.csv fileyh.csv NaN fileyh.csv
Post a Comment for "Pandas Sort Columns And Find Difference"