Skip to content Skip to sidebar Skip to footer

Order One Numpy Array By Another

I have an array that determines an ordering of elements: order = [3, 1, 4, 2] And then I want to sort another, larger array (containing only those elements): a = np.array([4, 2, 1

Solution 1:

Specific case : Ints

For ints, we could use bincount -

np.repeat(order,np.bincount(a)[order])

Sample run -

In [146]: sorted(a, key=order.index)
Out[146]: [3, 3, 1, 1, 1, 4, 4, 2]

In [147]: np.repeat(order,np.bincount(a)[order])
Out[147]: array([3, 3, 1, 1, 1, 4, 4, 2])

Generic case

Approach #1

Generalizing for all dtypes with bincount -

# https://stackoverflow.com/a/41242285/ @Andras Deak
def argsort_unique(idx):
    n = idx.size
    sidx = np.empty(n,dtype=int)
    sidx[idx] = np.arange(n)
    return sidx

sidx = np.argsort(order)
c = np.bincount(np.searchsorted(order,a,sorter=sidx))
out = np.repeat(order, c[argsort_unique(sidx)])

Approach #2-A

With np.unique and searchsorted for the case when all elements from order are in a -

unq, count = np.unique(a, return_counts=True)
out= np.repeat(order, count[np.searchsorted(unq, order)])

Approach #2-B

To cover for all cases, we need one extra step -

unq, count = np.unique(a, return_counts=1)
sidx = np.searchsorted(unq, order)
out = np.repeat(order, np.where(unq[sidx] == order,count[sidx],0))

Solution 2:

Building on @Divakar's solution, you can count how many times each element occurs and then repeat the ordered elements that many times:

c= Counter(a)
np.repeat(order,[c[v]for v in order])

(You could vectorize the count lookup if you like). I like this because it's linear time, even if it's not pure numpy.

I guess a pure numpy equivalent would look like this:

count = np.unique(a, return_counts=True)[1]
np.repeat(order, count[np.argsort(np.argsort(order))])

But that's less direct, more code, and way too many sorts. :)

Solution 3:

This is a fairly direct conversion of your pure-Python approach into numpy. The key idea is replacing the order.index function with a lookup in a sorted vector. Not sure if this is any simpler or faster than the solution you came up with, but it may generalize to some other cases.

import numpy as np
order = np.array([3, 1, 4, 2])
a = np.array([4, 2, 1, 1, 4, 3, 1, 3])  

# create sorted lookup vectors
ord = np.argsort(order)
order_sorted = order[ord]
indices_sorted = np.arange(len(order))[ord]

# lookup the index in `order` for each value in the `a` vector
a_indices = np.interp(a, order_sorted, indices_sorted).astype(int)

# sort `a` using the retrieved index values
a_sorted = a[np.argsort(a_indices)]
a_sorted

# array([3, 3, 1, 1, 1, 4, 4, 2])

This is a more direct way (based on this question), but it seems to be about 4 times slower than the np.interp approach:

lookup_dict = dict(zip(order, range(len(order))))
indices = np.vectorize(lookup_dict.__getitem__)(a)
a_sorted = a[np.argsort(indices)]

Post a Comment for "Order One Numpy Array By Another"