Merge Multiple Csv Files With Same Name In 10 Different Subdirectory

October 27, 2022 Post a Comment

i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file. e.g **strong text**DIRECTO

Solution 1:

There are many ways to do this, staying in Pandas I did the following.

With the file structure

root/  
├── dir1/  
│   ├── data_20170101_k   
│   ├── data_20170102_k    
│   ├── ...  
├── dir2/    
│   ├── data_20170101_k    
│   └── data_20170101_k  
│   └── ...   
└── ...

This code will work, it's a little verbose for explanation but you can shorten with implementation.

import glob
import pandas as pd

CONCAT_DIR = "/FILES_CONCAT/"

# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])

#    fullpath
# 0  root\dir1\data_20170101_k.csv
# 1  root\dir1\data_20170102_k.csv
# 2  root\dir2\data_20170101_k.csv
# 3  root\dir2\data_20170102_k.csv

# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})

#    path       filename
# 0  root\dir1  data_20170101_k.csv
# 1  root\dir1  data_20170102_k.csv
# 2  root\dir2  data_20170101_k.csv
# 3  root\dir2  data_20170102_k.csv

# Join these into one DataFrame
files = files.join(files_split)

#    fullpath                       path        filename
# 0  root\dir1\data_20170101_k.csv  root\dir1   data_20170101_k.csv
# 1  root\dir1\data_20170102_k.csv  root\dir1   data_20170102_k.csv
# 2  root\dir2\data_20170101_k.csv  root\dir2   data_20170101_k.csv
# 3  root\dir2\data_20170102_k.csv  root\dir2   data_20170102_k.csv

# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
    paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
    dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
    concat_df = pd.concat(dfs) # Concat dataframes into one
    concat_df.to_csv(CONCAT_DIR + f) # Save dataframe

Solution 2:

This can be achieved in much simple way in shell as:

Baca Juga

find . -name "*.csv" | xargs cat > mergedCSV

(Note: Don't use .csv in extension as it will cause inconsistency with find. After this command is finished, file can be renamed as .csv

Learn Python Tutorials

Merge Multiple Csv Files With Same Name In 10 Different Subdirectory

Solution 1:

Solution 2:

Post a Comment for "Merge Multiple Csv Files With Same Name In 10 Different Subdirectory"