Loading A Dataset In Python (numpy) When There Are Variable Spaces Delimiting Columns
I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like: 4 5 6 7 8 9 2 3 4 When I use this line: dataset=numpy.lo
Solution 1:
The default for delimiter
is 'any whitespace'. If you leave loadtxt
out, it copes with multiple spaces.
>>>from io import StringIO>>>dataset = StringIO('''\...4 5 6...7 8 9...2 3 4''')>>>import numpy>>>dataset_as_numpy = numpy.loadtxt(dataset)>>>dataset_as_numpy
array([[ 4., 5., 6.],
[ 7., 8., 9.],
[ 2., 3., 4.]])
Solution 2:
Use the numpy.genfromtxt
function:
>>>import numpy as np>>>dataset = np.genfromtxt(dataset.txt) >>>print dataset
array([[ 4., 5., 6.],
[ 7., 8., 19.],
[ 2., 3., 4.],
[ 1., 3., 204.]])
This is from the numpy documentation:
By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.
Hope this helps!
Post a Comment for "Loading A Dataset In Python (numpy) When There Are Variable Spaces Delimiting Columns"