Numpy: How To Quickly Normalize Many Vectors?
Solution 1:
Computing the magnitude
I came across this question and became curious about your method for normalizing. I use a different method to compute the magnitudes. Note: I also typically compute norms across the last index (rows in this case, not columns).
magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
Typically, however, I just normalize like so:
vectors /= np.sqrt((vectors **2).sum(-1))[..., np.newaxis]
A time comparison
I ran a test to compare the times, and found that my method is faster by quite a bit, but Freddie Witherdon's suggestion is even faster.
import numpy as np
vectors = np.random.rand(100, 25)
# OP's%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)# Output: 100 loops, best of 3: 2.39 ms per loop
# Mine%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]# Output: 10000 loops, best of 3: 13.8 us per loop
# Freddie's (from comment below)%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))# Output: 10000 loops, best of 3: 6.45 us per loop
Beware though, as this StackOverflow answer notes, there are some safety checks not happening with einsum
, so you should be sure that the dtype
of vectors
is sufficient to store the square of the magnitudes accurately enough.
Solution 2:
Well, unless I missed something, this does work:
vectors / norms
The problem in your suggestion is the broadcasting rules.
vectors # shape 2, 10
norms # shape 10
The shape do not have the same length! So the rule is to first extend the small shape by one on the left:
norms # shape 1,10
You can do that manually by calling:
vectors / norms.reshape(1,-1) # same as vectors/norms
If you wanted to compute vectors.T/norms
, you would have to do the reshaping manually, as follows:
vectors.T / norms.reshape(-1,1) # this works
Solution 3:
Alright: NumPy's array shape broadcast adds dimensions to the left of the array shape, not to its right. NumPy can however be instructed to add a dimension to the right of the norms
array:
print vectors.T / norms[:, newaxis]
does work!
Solution 4:
there is already a function in scikit learn:
import sklearn.preprocessing as preprocessingnorm=preprocessing.normalize(m, norm='l2')*
More info at:
Solution 5:
My preferred way to normalize vectors is by using numpy's inner1d to calculate their magnitudes. Here's what's been suggested so far compared to inner1d
import numpy as np
from numpy.core.umath_tests import inner1d
COUNT =10**6 # 1 million points
points = np.random.random_sample((COUNT,3,))
A = np.sqrt(np.einsum('...i,...i', points, points))
B = np.apply_along_axis(np.linalg.norm, 1, points)
C = np.sqrt((points **2).sum(-1))
D = np.sqrt((points*points).sum(axis=1))
E = np.sqrt(inner1d(points,points))
print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]
Testing performance with cProfile:
import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)') # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))') # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))') # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))') # 2 function calls in 0.009 seconds
inner1d computed the magnitudes a hair faster than einsum. So using inner1d to normalize:
n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds
Testing against scikit:
import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True
Conclusion: using inner1d seems to be the best option
Post a Comment for "Numpy: How To Quickly Normalize Many Vectors?"