Skip to content Skip to sidebar Skip to footer

Speed Up Sub-array Shuffling And Storing

I have a list of integers (di), and another list (rang_indx) made up of numpy sub-arrays of integers (code below). For each of these sub-arrays, I need to store in a separate list

Solution 1:

Approach #1 : Here's one idea with the intention to keep minimal work when we loop and use one loop only -

  1. Create a 2D random array in interval [0,1) to cover the max. length of subarrays.
  2. For each subarray, set the invalid places to 1.0. Get argsort for each row. Those 1s corresponding to the invalid places would stay at the back because there were no 1s in the original random array. Thus, we have the indices array.
  3. Slice each row of those indices array to the extent of the lengths listed in di.
  4. Start a loop and slice each subarray from rang_indx using those sliced indices.

Hence, the implementation -

lens = np.array([len(i) for i in rang_indx])
di0 = np.minimum(lens, di.astype(int))
invalid_mask = lens[:,None] <= np.arange(lens.max())
rand_nums = np.random.rand(len(lens), lens.max())
rand_nums[invalid_mask] = 1
shuffled_indx = np.argpartition(rand_nums, lens-1, axis=1)

out = []
for i,all_idx inenumerate(shuffled_indx):
    if lens[i]==0:
        out.append(np.array([]))
    else:
        slice_idx = all_idx[:di0[i]]
        out.append(rang_indx[i][slice_idx])

Approach #2 : Another way with doing much of the setup work in an efficient manner within the loop -

lens = np.array([len(i) for i in rang_indx])
di0 = np.minimum(lens, di.astype(int))
out = []
for i in range(len(lens)):
    if lens[i]==0:
        out.append(np.array([]))
    else:
        k = di0[i]
        slice_idx = np.argpartition(np.random.rand(lens[i]), k-1)[:k]
        out.append(rang_indx[i][slice_idx])

Post a Comment for "Speed Up Sub-array Shuffling And Storing"