Skip to content Skip to sidebar Skip to footer

Trying To Size Down HDF5 File By Changing Index Field Types Using H5py

I have a very large CSV File (~12Gb) that looks something like this: posX,posY,posZ,eventID,parentID,clockTime -117.9853515625,60.2998046875,0.29499998688697815,0,0,0 -117.985

Solution 1:

Have you considered the numpy module? It has a handy function (genfromtxt) to read CSV data with headers into a Numpy array. You define the dtype. The array is suitable for loading into HDF5 with the h5py.create_dataset() function.

See code below. I included 2 print statements. The first shows the dtype names created from the CSV headers. The second shows how you can access the data in the numpy array by field (column) name.

import h5py
import numpy as np

PATH_csv = 'SO_55576601.csv'
csv_dtype= ('f8', 'f8', 'f8', 'i4', 'i4', 'i4' )

csv_data = np.genfromtxt(PATH_csv, dtype=csv_dtype, delimiter=',', names=True)

print (csv_data.dtype.names)
print (csv_data['posX'])

with h5py.File('SO_55576601.h5', 'w') as h5f:
    dset = h5f.create_dataset('CSV_data', data=csv_data)

h5f.close()   

Post a Comment for "Trying To Size Down HDF5 File By Changing Index Field Types Using H5py"