Merge Multiple Csv Files With Same Name In 10 Different Subdirectory
i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file. e.g **strong text**DIRECTO
Solution 1:
There are many ways to do this, staying in Pandas I did the following.
With the file structure
root/
├── dir1/
│ ├── data_20170101_k
│ ├── data_20170102_k
│ ├── ...
├── dir2/
│ ├── data_20170101_k
│ └── data_20170101_k
│ └── ...
└── ...
This code will work, it's a little verbose for explanation but you can shorten with implementation.
import glob
import pandas as pd
CONCAT_DIR = "/FILES_CONCAT/"
# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("root/*/*")], columns=["fullpath"])
# fullpath
# 0 root\dir1\data_20170101_k.csv
# 1 root\dir1\data_20170102_k.csv
# 2 root\dir2\data_20170101_k.csv
# 3 root\dir2\data_20170102_k.csv
# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
# path filename
# 0 root\dir1 data_20170101_k.csv
# 1 root\dir1 data_20170102_k.csv
# 2 root\dir2 data_20170101_k.csv
# 3 root\dir2 data_20170102_k.csv
# Join these into one DataFrame
files = files.join(files_split)
# fullpath path filename
# 0 root\dir1\data_20170101_k.csv root\dir1 data_20170101_k.csv
# 1 root\dir1\data_20170102_k.csv root\dir1 data_20170102_k.csv
# 2 root\dir2\data_20170101_k.csv root\dir2 data_20170101_k.csv
# 3 root\dir2\data_20170102_k.csv root\dir2 data_20170102_k.csv
# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
concat_df = pd.concat(dfs) # Concat dataframes into one
concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
Solution 2:
This can be achieved in much simple way in shell as:
find . -name "*.csv" | xargs cat > mergedCSV
(Note: Don't use .csv in extension as it will cause inconsistency with find. After this command is finished, file can be renamed as .csv
Post a Comment for "Merge Multiple Csv Files With Same Name In 10 Different Subdirectory"