Storing Python Objects In A Python List Vs. A Fixed-length Numpy Array
Solution 1:
Don't use object arrays in numpy for things like this.
They defeat the basic purpose of a numpy array, and while they're useful in a tiny handful of situations, they're almost always a poor choice.
Yes, accessing an individual element of a numpy array in python or iterating through a numpy array in python is slower than the equivalent operation with a list
. (Which is why you should never do something like y = [item * 2 for item in x]
when x
is a numpy array.)
Numpy object arrays will have a slightly lower memory overhead than a list, but if you're storing that many individual python objects, you're going to run into other memory problems first.
Numpy is first and foremost a memory-efficient, multidimensional array container for uniform numerical data. If you want to hold arbitrary objects in a numpy array, you probably want a list, instead.
My point is that if you want to use numpy effectively, you may need to re-think how you're structuring things.
Instead of storing each object instance in a numpy array, store your numerical data in a numpy array, and if you need separate objects for each row/column/whatever, store an index into that array in each instance.
This way you can operate on the numerical arrays quickly (i.e. using numpy instead of list comprehensions).
As a quick example of what I'm talking about, here's a trivial example without using numpy:
from random import random
class PointSet(object):
def __init__(self, numpoints):
self.points = [Point(random(), random()) for _ in xrange(numpoints)]
def update(self):
for point in self.points:
point.x += random() - 0.5
point.y += random() - 0.5
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
points = PointSet(100000)
point = points.points[10]
for _ in xrange(1000):
points.update()
print 'Position of one point out of 100000:', point.x, point.y
And a similar example using numpy arrays:
import numpy as np
class PointSet(object):
def __init__(self, numpoints):
self.coords = np.random.random((numpoints, 2))
self.points = [Point(i, self.coords) for i in xrange(numpoints)]
def update(self):
"""Update along a random walk."""
# The "+=" is crucial here... We have to update "coords" in-place, in
# this case.
self.coords += np.random.random(self.coords.shape) - 0.5
class Point(object):
def __init__(self, i, coords):
self.i = i
self.coords = coords
@property
def x(self):
return self.coords[self.i,0]
@property
def y(self):
return self.coords[self.i,1]
points = PointSet(100000)
point = points.points[10]
for _ in xrange(1000):
points.update()
print 'Position of one point out of 100000:', point.x, point.y
There are other ways to do this (you may want to avoid storing a reference to a specific numpy array in each point
, for example), but I hope it's a useful example.
Note the difference in speed at which they run. On my machine, it's a difference of 5 seconds for the numpy version vs 60 seconds for the pure-python version.
Post a Comment for "Storing Python Objects In A Python List Vs. A Fixed-length Numpy Array"