Trying To Size Down HDF5 File By Changing Index Field Types Using H5py
I have a very large CSV File (~12Gb) that looks something like this: posX,posY,posZ,eventID,parentID,clockTime -117.9853515625,60.2998046875,0.29499998688697815,0,0,0 -117.985
Solution 1:
Have you considered the numpy
module?
It has a handy function (genfromtxt
) to read CSV data with headers into a Numpy array. You define the dtype. The array is suitable for loading into HDF5 with the h5py.create_dataset()
function.
See code below. I included 2 print statements. The first shows the dtype names created from the CSV headers. The second shows how you can access the data in the numpy array by field (column) name.
import h5py
import numpy as np
PATH_csv = 'SO_55576601.csv'
csv_dtype= ('f8', 'f8', 'f8', 'i4', 'i4', 'i4' )
csv_data = np.genfromtxt(PATH_csv, dtype=csv_dtype, delimiter=',', names=True)
print (csv_data.dtype.names)
print (csv_data['posX'])
with h5py.File('SO_55576601.h5', 'w') as h5f:
dset = h5f.create_dataset('CSV_data', data=csv_data)
h5f.close()
Post a Comment for "Trying To Size Down HDF5 File By Changing Index Field Types Using H5py"