Skip to content Skip to sidebar Skip to footer

Sub Value And Add New Column Pandas

I am trying to read few files from a path as extension to my previous question The answer given by Jianxun Definitely makes sense but I am getting a key error. very very new to pan

Solution 1:

import pandas as pd
import numpy as np

# your csv file contents
csv_file1 = '/home/Jian/Downloads/stack_flow_bundle/Transition_Data/Test_1.csv'
csv_file2 = '/home/Jian/Downloads/stack_flow_bundle/Transition_Data/Test_2.csv'
master_csv_file = '/home/Jian/Downloads/stack_flow_bundle/Data_repository/master_lac_Test.csv'
csv_file_all = [csv_file1, csv_file2]

# read csv into df using list comprehension
# I use buffer here, replace stringIO with your file path

df_all = [pd.read_csv(csv_file) for csv_file in csv_file_all]

# processing
# =====================================================
# concat along axis=0, outer join on axis=1
merged = pd.concat(df_all, axis=0, ignore_index=True, join='outer').set_index('Ids')

# custom function to handle/merge duplicates on Ids (axis=0)
def apply_func(group):
    return group.fillna(method='ffill').iloc[-1]

# remove Ids duplicates
merged_unique = merged.groupby(level='Ids').apply(apply_func)

# do the subtraction

df_master = pd.read_csv(master_csv_file, index_col=['Ids']).sort_index()

# select matching records and horizontal concat
df_matched = pd.concat([df_master,merged_unique.reindex(df_master.index)], axis=1)

# use broadcasting
df_matched.iloc[:, 1:] = df_matched.iloc[:, 1:].sub(df_matched.iloc[:, 0], axis=0)

print(df_matched)

      00:00:00  00:30:00  00:45:00  12:00:00  12:45:00
Ids                                                   
1234      1000      -500      -900       NaN      8865
2341       563      -163      -163      9302       NaN
7352       345       155       255      8624       NaN
8435      5243     -4943     -5043       NaN      3726

Post a Comment for "Sub Value And Add New Column Pandas"