Loading Hdf5 Matlab Strings Into Python
Solution 1:
I assume you mean it is a cell array of strings in MATLAB? This output looks normal: the dataset is an array of objects (|O4
is the NumPy object datatype). Each object is an array of 2-byte integers (<u2
is the NumPy little-endian unsigned 2-byte integer datatype). h5py has no way of knowing that the dataset is a cell array of strings; it could just as well be a cell array of arbitrary 16-bit integers.
The easiest way to get the strings out would be to use an iterator using unichr to convert the characters, like this:
strlist = [u''.join(unichr(c) for c in h5file[obj_ref]) for obj_ref in dataset])
What this does is iterate over the dataset (for obj_ref in dataset
) to create a new list. For each object reference, it dereferences the object (h5file[obj_ref]
) to get an array of integers. It converts each integer into a character (unichr(c)
) and joins those characters all together into a Unicode string (u''.join()
).
Note that this produces a list of unicode strings. If you are absolutely sure that every string contains only ASCII characters, you can replace u''
by ''
and unichr
by chr
.
Caveat: I don't have h5py; this post is based on my experiences with MATLAB and NumPy. You may need to adjust the syntax or iteration order to suite your dataset.
Solution 2:
You can get the original Matlab class name of Group
and Dataset
objects by
dataset.attrs['MATLAB_class']
if dataset
contains a string, it will return b'char'
.
Post a Comment for "Loading Hdf5 Matlab Strings Into Python"