Order One Numpy Array By Another
Solution 1:
Specific case : Ints
For ints
, we could use bincount
-
np.repeat(order,np.bincount(a)[order])
Sample run -
In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]
In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])
Generic case
Approach #1
Generalizing for all dtypes with bincount
-
# https://stackoverflow.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
n = idx.size
sidx = np.empty(n,dtype=int)
sidx[idx] = np.arange(n)
return sidx
sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])
Approach #2-A
With np.unique
and searchsorted
for the case when all elements from order
are in a
-
unq, count = np.unique(a, return_counts=True)
out= np.repeat(order, count[np.searchsorted(unq, order)])
Approach #2-B
To cover for all cases, we need one extra step -
unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))
Solution 2:
Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:
c= Counter(a)
np.repeat(order,[c[v]for v in order])
(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.
I guess a pure numpy equivalent would look like this:
count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])
But that's less direct, more code, and way too many sorts. :)
Solution 3:
This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index
function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.
import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])
# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]
# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)
# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted
# array([3, 3, 1, 1, 1, 4, 4, 2])
This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp
approach:
lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]
Post a Comment for "Order One Numpy Array By Another"