Trying To Size Down HDF5 File By Changing Index Field Types Using H5py

February 27, 2023 Post a Comment

I have a very large CSV File (~12Gb) that looks something like this: posX,posY,posZ,eventID,parentID,clockTime -117.9853515625,60.2998046875,0.29499998688697815,0,0,0 -117.985

Solution 1:

Have you considered the numpy module? It has a handy function (genfromtxt) to read CSV data with headers into a Numpy array. You define the dtype. The array is suitable for loading into HDF5 with the h5py.create_dataset() function.

See code below. I included 2 print statements. The first shows the dtype names created from the CSV headers. The second shows how you can access the data in the numpy array by field (column) name.

import h5py
import numpy as np

PATH_csv = 'SO_55576601.csv'
csv_dtype= ('f8', 'f8', 'f8', 'i4', 'i4', 'i4' )

csv_data = np.genfromtxt(PATH_csv, dtype=csv_dtype, delimiter=',', names=True)

print (csv_data.dtype.names)
print (csv_data['posX'])

with h5py.File('SO_55576601.h5', 'w') as h5f:
    dset = h5f.create_dataset('CSV_data', data=csv_data)

h5f.close()

Learn Python Tutorials

Trying To Size Down HDF5 File By Changing Index Field Types Using H5py

Solution 1:

Post a Comment for "Trying To Size Down HDF5 File By Changing Index Field Types Using H5py"