NumPy提出了一种通过np.argmax获取数组最大值索引的方法。

我想要一个类似的东西,但返回N个最大值的索引。

例如,如果我有一个数组[1,3,2,4,5],那么nargmax(array, n=3)将返回对应于元素[5,4,3]的下标[4,3,1]。


当前回答

Use:

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

它也适用于2D数组。例如,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

其他回答

如果你碰巧在使用一个多维数组,那么你需要平展和解开索引:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

例如:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

对于多维数组,可以使用axis关键字,以便沿着预期的轴应用分区。

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

对于抓取物品:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

但请注意,这不会返回一个排序的结果。在这种情况下,你可以沿着预期的轴使用np.argsort():

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

这里有一个例子:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

Use:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

对于常规的Python列表:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

如果使用Python 2,请使用xrange而不是range。

来源:堆队列算法

当top_k<<axis_length时,它优于argsort。

import numpy as np

def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
    if reverse:
        axis_length = array.shape[axis]
        partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
                                  range(axis_length - top_k, axis_length), axis)
    else:
        partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
    top_scores = np.take_along_axis(array, partition_index, axis)
    # resort partition
    sorted_index = np.argsort(top_scores, axis=axis)
    if reverse:
        sorted_index = np.flip(sorted_index, axis=axis)
    top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
    top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)
    return top_sorted_scores, top_sorted_indexes

if __name__ == "__main__":
    import time
    from sklearn.metrics.pairwise import cosine_similarity

    x = np.random.rand(10, 128)
    y = np.random.rand(1000000, 128)
    z = cosine_similarity(x, y)
    start_time = time.time()
    sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
    print(time.time() - start_time)

我认为最省时的方法是手动遍历数组并保持k-size的min-heap,正如其他人所提到的那样。

我还想出了一个蛮力方法:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

在使用argmax获取其索引后,将最大的元素设置为一个较大的负值。然后argmax的下一次调用将返回第二大的元素。 您可以记录这些元素的原始值,并在需要时恢复它们。