Skip to content Skip to sidebar Skip to footer

Loading A Dataset In Python (numpy) When There Are Variable Spaces Delimiting Columns

I have a big dataset contains numeric data and in some of its rows there are variable spaces delimiting columns, like: 4 5 6 7 8 9 2 3 4 When I use this line: dataset=numpy.lo

Solution 1:

The default for delimiter is 'any whitespace'. If you leave loadtxt out, it copes with multiple spaces.

>>>from io import StringIO>>>dataset = StringIO('''\...4 5 6...7 8     9...2 3 4''')>>>import numpy>>>dataset_as_numpy = numpy.loadtxt(dataset)>>>dataset_as_numpy
array([[ 4.,  5.,  6.],
       [ 7.,  8.,  9.],
       [ 2.,  3.,  4.]])

Solution 2:

Use the numpy.genfromtxt function:

>>>import numpy as np>>>dataset = np.genfromtxt(dataset.txt) >>>print dataset
array([[   4.,    5.,    6.],
       [   7.,    8.,   19.],
       [   2.,    3.,    4.],
       [   1.,    3.,  204.]])

This is from the numpy documentation:

By default, genfromtxt assumes delimiter=None, meaning that the line is split along white spaces (including tabs) and that consecutive white spaces are considered as a single white space.

Hope this helps!

Post a Comment for "Loading A Dataset In Python (numpy) When There Are Variable Spaces Delimiting Columns"