Skip to content Skip to sidebar Skip to footer

Vectorization : Too Many Indices For Array

a=b=np.arange(9).reshape(3,3) i=np.arange(3) mask=a>>array([0, 1, 2]) b[np.where(mask[1])] >>>array([0, 1, 2, 3])

Solution 1:

In [165]: a
Out[165]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [166]: mask
Out[166]: 
array([[[ True,  True,  True],
        [False, False, False],
        [False, False, False]],

       [[ True,  True,  True],
        [ True, False, False],
        [False, False, False]],

       [[ True,  True,  True],
        [ True,  True, False],
        [False, False, False]]], dtype=bool)

So a (and b) is (3,3), while mask is (3,3,3).

A boolean mask, applied to an array produces a 1d (same when applied via where):

In[170]: a[mask[1,:,:]]
Out[170]: array([0, 1, 2, 3])

The where on the 2d mask produces a 2 element tuple, which can index the 2d array:

In [173]: np.where(mask[1,:,:])
Out[173]: (array([0, 0, 0, 1], dtype=int32), array([0, 1, 2, 0], dtype=int32))

where on the 3d mask is a 3 element tuple - hence the too many indices error:

In [174]: np.where(mask)
Out[174]: 
(array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32),
 array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1], dtype=int32),
 array([0, 1, 2, 0, 1, 2, 0, 0, 1, 2, 0, 1], dtype=int32))

Let's try expanding a to 3d and apply the mask

In[176]: np.tile(a[None,:],(3,1,1)).shapeOut[176]: (3, 3, 3)
In[177]: np.tile(a[None,:],(3,1,1))[mask]Out[177]: array([0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4])

The values are there, but they are joined.

We can count the number of True in each plane of mask, and use that to split the masked tile:

In [185]: mask.sum(axis=(1,2))
Out[185]: array([3, 4, 5])
In [186]: cnt=np.cumsum(mask.sum(axis=(1,2)))
In [187]: cnt
Out[187]: array([ 3,  7, 12], dtype=int32)

In [189]: np.split(np.tile(a[None,:],(3,1,1))[mask], cnt[:-1])
Out[189]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]

Internally np.split uses a Python level iteration. So iteration on the mask planes might be just as good (6x faster on this small example).

In [190]: [a[m] for m in mask]
Out[190]: [array([0, 1, 2]), array([0, 1, 2, 3]), array([0, 1, 2, 3, 4])]

That points to a fundamental problem with the desired 'vectorization', the individual arrays are (3,), (4,) and (5,) shape. Differing size arrays is a strong indicator that true 'vectorization' is difficult if not impossible.

Post a Comment for "Vectorization : Too Many Indices For Array"