如何在numpy数组中找到最近的值?例子:

np.find_nearest(array, value)

当前回答

下面是一个处理非标量“values”数组的版本:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

如果输入是标量,则返回数字类型(例如int, float)的版本:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

其他回答

答案总结:如果有一个排序的数组,那么平分代码(如下所示)执行最快。大型数组快100-1000倍,小型数组快2-100倍。它也不需要numpy。 如果你有一个未排序的数组,那么如果数组很大,应该首先考虑使用O(n logn)排序,然后平分,如果数组很小,那么方法2似乎是最快的。

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

二等分的一半:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

现在我将从其他答案定义代码,它们都返回一个索引:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

现在我来计时代码: 注意方法1、2、4、5没有正确给出间隔。方法1、2、4四舍五入到数组中最近的点(例如>=1.5 -> 2),方法5总是四舍五入(例如1.45 -> 2)。只有方法3和6,当然还有对半给出了正确的间隔。

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

对于一个大数组的对半分割是4us,而次之的是180us,最长的是1.21ms(快100 - 1000倍)。对于较小的数组,它要快2-100倍。

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

如果你的数组已经排序并且非常大,这是一个更快的解决方案:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

这可以扩展到非常大的阵列。如果不能假定数组已经排序,可以很容易地修改上面的内容以在方法中排序。对于小型数组来说,这是多余的,但一旦它们变大,这就快得多了。

下面是一个使用2D数组的版本,如果用户拥有scipy的cdist函数,则使用它,如果用户没有,则使用更简单的距离计算。

默认情况下,输出是最接近输入值的索引,但您可以使用output关键字将其更改为'index', 'value'或'both'之一,其中'value'输出数组[index], 'both'输出索引,数组[index]。

对于非常大的数组,您可能需要使用kind='euclidean',因为默认的scipy cdist函数可能会耗尽内存。

这可能不是绝对最快的解决方案,但已经很接近了。

def find_nearest_2d(array, value, kind='cdist', output='index'):
    # 'array' must be a 2D array
    # 'value' must be a 1D array with 2 elements
    # 'kind' defines what method to use to calculate the distances. Can choose one
    #    of 'cdist' (default) or 'euclidean'. Choose 'euclidean' for very large
    #    arrays. Otherwise, cdist is much faster.
    # 'output' defines what the output should be. Can be 'index' (default) to return
    #    the index of the array that is closest to the value, 'value' to return the
    #    value that is closest, or 'both' to return index,value
    import numpy as np
    if kind == 'cdist':
        try: from scipy.spatial.distance import cdist
        except ImportError:
            print("Warning (find_nearest_2d): Could not import cdist. Reverting to simpler distance calculation")
            kind = 'euclidean'
    index = np.where(array == value)[0] # Make sure the value isn't in the array
    if index.size == 0:
        if kind == 'cdist': index = np.argmin(cdist([value],array)[0])
        elif kind == 'euclidean': index = np.argmin(np.sum((np.array(array)-np.array(value))**2.,axis=1))
        else: raise ValueError("Keyword 'kind' must be one of 'cdist' or 'euclidean'")
    if output == 'index': return index
    elif output == 'value': return array[index]
    elif output == 'both': return index,array[index]
    else: raise ValueError("Keyword 'output' must be one of 'index', 'value', or 'both'")

这个函数使用numpy searchsorted处理任意数量的查询,因此在对输入数组进行排序之后,它的速度也一样快。 它可以在2d, 3d的规则网格上工作…:

#!/usr/bin/env python3
# keywords: nearest-neighbor regular-grid python numpy searchsorted Voronoi

import numpy as np

#...............................................................................
class Near_rgrid( object ):
    """ nearest neighbors on a Manhattan aka regular grid
    1d:
    near = Near_rgrid( x: sorted 1d array )
    nearix = near.query( q: 1d ) -> indices of the points x_i nearest each q_i
        x[nearix[0]] is the nearest to q[0]
        x[nearix[1]] is the nearest to q[1] ...
        nearpoints = x[nearix] is near q
    If A is an array of e.g. colors at x[0] x[1] ...,
    A[nearix] are the values near q[0] q[1] ...
    Query points < x[0] snap to x[0], similarly > x[-1].

    2d: on a Manhattan aka regular grid,
        streets running east-west at y_i, avenues north-south at x_j,
    near = Near_rgrid( y, x: sorted 1d arrays, e.g. latitide longitude )
    I, J = near.query( q: nq × 2 array, columns qy qx )
    -> nq × 2 indices of the gridpoints y_i x_j nearest each query point
        gridpoints = np.column_stack(( y[I], x[J] ))  # e.g. street corners
        diff = gridpoints - querypoints
        distances = norm( diff, axis=1, ord= )
    Values at an array A definded at the gridpoints y_i x_j nearest q: A[I,J]

    3d: Near_rgrid( z, y, x: 1d axis arrays ) .query( q: nq × 3 array )

    See Howitworks below, and the plot Voronoi-random-regular-grid.
    """

    def __init__( self, *axes: "1d arrays" ):
        axarrays = []
        for ax in axes:
            axarray = np.asarray( ax ).squeeze()
            assert axarray.ndim == 1, "each axis should be 1d, not %s " % (
                    str( axarray.shape ))
            axarrays += [axarray]
        self.midpoints = [_midpoints( ax ) for ax in axarrays]
        self.axes = axarrays
        self.ndim = len(axes)

    def query( self, queries: "nq × dim points" ) -> "nq × dim indices":
        """ -> the indices of the nearest points in the grid """
        queries = np.asarray( queries ).squeeze()  # or list x y z ?
        if self.ndim == 1:
            assert queries.ndim <= 1, queries.shape
            return np.searchsorted( self.midpoints[0], queries )  # scalar, 0d ?
        queries = np.atleast_2d( queries )
        assert queries.shape[1] == self.ndim, [
                queries.shape, self.ndim]
        return [np.searchsorted( mid, q )  # parallel: k axes, k processors
                for mid, q in zip( self.midpoints, queries.T )]

    def snaptogrid( self, queries: "nq × dim points" ):
        """ -> the nearest points in the grid, 2d [[y_j x_i] ...] """
        ix = self.query( queries )
        if self.ndim == 1:
            return self.axes[0][ix]
        else:
            axix = [ax[j] for ax, j in zip( self.axes, ix )]
            return np.array( axix )


def _midpoints( points: "array-like 1d, *must be sorted*" ) -> "1d":
    points = np.asarray( points ).squeeze()
    assert points.ndim == 1, points.shape
    diffs = np.diff( points )
    assert np.nanmin( diffs ) > 0, "the input array must be sorted, not %s " % (
            points.round( 2 ))
    return (points[:-1] + points[1:]) / 2  # floats

#...............................................................................
Howitworks = \
"""
How Near_rgrid works in 1d:
Consider the midpoints halfway between fenceposts | | |
The interval [left midpoint .. | .. right midpoint] is what's nearest each post --

    |   |       |                     |   points
    | . |   .   |          .          |   midpoints
      ^^^^^^               .            nearest points[1]
            ^^^^^^^^^^^^^^^             nearest points[2]  etc.

2d:
    I, J = Near_rgrid( y, x ).query( q )
    I = nearest in `x`
    J = nearest in `y` independently / in parallel.
    The points nearest [yi xj] in a regular grid (its Voronoi cell)
    form a rectangle [left mid x .. right mid x] × [left mid y .. right mid y]
    (in any norm ?)
    See the plot Voronoi-random-regular-grid.

Notes
-----
If a query point is exactly halfway between two data points,
e.g. on a grid of ints, the lines (x + 1/2) U (y + 1/2),
which "nearest" you get is implementation-dependent, unpredictable.

"""

Murky = \
""" NaNs in points, in queries ?
"""

__version__ = "2021-10-25 oct  denis-bz-py"