Skip to content Skip to sidebar Skip to footer

Numpy Dynamic Array Slicing Based On Min/max Values

I have a 3 dimensional array of hape (365, x, y) where 36 corresponds to =daily data. In some cases, all the elements along the time axis axis=0 are np.nan. The time series for eac

Solution 1:

While using masked arrays may not be the most efficient solution in this case, it will allow you to perform masked operations on specific axes while more-or-less preserving shape, which is a great convenience. Keep in mind that in many cases, the masked functions will still end up copying the masked data.

You have mostly the right idea in your current code, but you missed a couple of tricks, like being able to negate and combine masks. Also the fact that allocating masks as boolean up front is more efficient, and little nitpicks like np.full(..., 0) -> np.zeros(..., dtype=bool).

Let's work through this backwards. Let's say you had a well-behaved 1-D array with a peak, say a1. You can use masking to easily find the maxima and minima (or indices) like this:

peak_index = np.nanargmax(a1)
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.nanargmin(np.ma.array(a1, mask=~mask))
trough_minus = np.nanargmin(np.ma.array(a1, mask=mask))

This respects the fact that masked arrays flip the sense of the mask relative to normal numpy boolean indexing. It's also OK that the maximum value appears in the calculation of trough_plus, since it's guaranteed not to be a minimum (unless you have the all-nan situation).

Now if a1 was a masked array already (but still 1D), you could do the same thing, but combine the masks temporarily. For example:

a1 = np.ma.array(a1, mask=np.isnan(a1))
peak_index = a1.argmax()
mask = np.zeros(a1.size, dtype=np.bool)
mask[peak:] = True
trough_plus = np.ma.masked_array(a1, mask=a.mask | ~mask).argmin()
trough_minus  (np.ma.masked_array(a1, mask=a.mask | mask).argmin()

Again, since masked arrays have reversed masks, it's important to combine the masks using | instead of &, as you would for normal numpy boolean masks. In this case, there is no need for calling the nan version of argmax and argmin, since all the nans are already masked out.

Hopefully, the generalization to multiple dimensions becomes clear from here, given the prevalence of the axis keyword in numpy functions:

a = np.ma.array(a, mask=np.isnan(a))
peak_indices = a.argmax(axis=0).reshape(1, *a.shape[1:])
mask = np.arange(a.shape[0]).reshape(-1, *(1,) * (a.ndim - 1)) >= peak_indices

trough_plus = np.ma.masked_array(a, mask=~mask | a.mask).argmin(axis=0)
trough_minus = np.ma.masked_array(a, mask=mask | a.mask).argmin(axis=0)

N-dimensional masking technique comes from Fill mask efficiently based on start indices, which was asked just for this purpose.

Solution 2:

Here is a method that

  1. copies the data
  2. saves all nan positions and replaces all nans with global min-1
  3. finds the rowwise argmax
  4. subtracts its value from the entire row
    • note that each row now has only non-positive values with the max value now being zero
  5. zeros all nan positions
  6. flips the sign of all values right of the max
    • this is the main idea; it creates a new row-global max at the position where before there was the right hand min; at the same time it ensures that the left hand min is now row-global
  7. retrieves the rowwise argmin and argmax, these are the postitions of the left and right mins in the original array
  8. finds all-nan rows and overwrites the max and min indices at these positions with INVALINT

Code:

INVALINT = -9999
t,x,y = a.shape
t,x,y = np.ogrid[:t,:x,:y]
inval = np.isnan(a)
b = np.where(inval,np.nanmin(a)-1,a)
pk = b.argmax(axis=0)
pkval = b[pk,x,y]
b -= pkval
b[inval] = 0
b[t>pk[None]] *= -1
ltr = b.argmin(axis=0)
rtr = b.argmax(axis=0)
del b
inval = inval.all(axis=0)
pk[inval] = INVALINT
ltr[inval] = INVALINT
rtr[inval] = INVALINT

# result is now in ltr ("left trough"), pk ("peak") and rtr

Post a Comment for "Numpy Dynamic Array Slicing Based On Min/max Values"