Skip to content Skip to sidebar Skip to footer

Loading Hdf5 Matlab Strings Into Python

I'm running into trouble reading a hdf5 matlab 7.3 file with Python. I'm using h5py 2.0.1. I can read all the matrices that are stored in the file, but I can not read a list of str

Solution 1:

I assume you mean it is a cell array of strings in MATLAB? This output looks normal: the dataset is an array of objects (|O4 is the NumPy object datatype). Each object is an array of 2-byte integers (<u2 is the NumPy little-endian unsigned 2-byte integer datatype). h5py has no way of knowing that the dataset is a cell array of strings; it could just as well be a cell array of arbitrary 16-bit integers.

The easiest way to get the strings out would be to use an iterator using unichr to convert the characters, like this:

strlist = [u''.join(unichr(c) for c in h5file[obj_ref]) for obj_ref in dataset])

What this does is iterate over the dataset (for obj_ref in dataset) to create a new list. For each object reference, it dereferences the object (h5file[obj_ref]) to get an array of integers. It converts each integer into a character (unichr(c)) and joins those characters all together into a Unicode string (u''.join()).

Note that this produces a list of unicode strings. If you are absolutely sure that every string contains only ASCII characters, you can replace u'' by '' and unichr by chr.

Caveat: I don't have h5py; this post is based on my experiences with MATLAB and NumPy. You may need to adjust the syntax or iteration order to suite your dataset.

Solution 2:

You can get the original Matlab class name of Group and Dataset objects by

dataset.attrs['MATLAB_class']

if dataset contains a string, it will return b'char'.

Post a Comment for "Loading Hdf5 Matlab Strings Into Python"