NumPy提出了一种通过np.argmax获取数组最大值索引的方法。
我想要一个类似的东西,但返回N个最大值的索引。
例如,如果我有一个数组[1,3,2,4,5],那么nargmax(array, n=3)将返回对应于元素[5,4,3]的下标[4,3,1]。
NumPy提出了一种通过np.argmax获取数组最大值索引的方法。
我想要一个类似的东西,但返回N个最大值的索引。
例如,如果我有一个数组[1,3,2,4,5],那么nargmax(array, n=3)将返回对应于元素[5,4,3]的下标[4,3,1]。
当前回答
Use:
def max_indices(arr, k):
'''
Returns the indices of the k first largest elements of arr
(in descending order in values)
'''
assert k <= arr.size, 'k should be smaller or equal to the array size'
arr_ = arr.astype(float) # make a copy of arr
max_idxs = []
for _ in range(k):
max_element = np.max(arr_)
if np.isinf(max_element):
break
else:
idx = np.where(arr_ == max_element)
max_idxs.append(idx)
arr_[idx] = -np.inf
return max_idxs
它也适用于2D数组。例如,
In [0]: A = np.array([[ 0.51845014, 0.72528114],
[ 0.88421561, 0.18798661],
[ 0.89832036, 0.19448609],
[ 0.89832036, 0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
[(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
(array([1], dtype=int64), array([0], dtype=int64)),
(array([0], dtype=int64), array([1], dtype=int64)),
(array([0], dtype=int64), array([0], dtype=int64)),
(array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
(array([1], dtype=int64), array([1], dtype=int64))]
In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])
其他回答
如果你碰巧在使用一个多维数组,那么你需要平展和解开索引:
def largest_indices(ary, n):
"""Returns the n largest indices from a numpy array."""
flat = ary.flatten()
indices = np.argpartition(flat, -n)[-n:]
indices = indices[np.argsort(-flat[indices])]
return np.unravel_index(indices, ary.shape)
例如:
>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0. , 0.84147098, 0.90929743],
[ 0.14112001, -0.7568025 , -0.95892427],
[-0.2794155 , 0.6569866 , 0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825, 0.90929743, 0.84147098])
对于多维数组,可以使用axis关键字,以便沿着预期的轴应用分区。
# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]
对于抓取物品:
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)
但请注意,这不会返回一个排序的结果。在这种情况下,你可以沿着预期的轴使用np.argsort():
indices = np.argsort(arr, axis=1)[:, -N:]
# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)
这里有一个例子:
In [42]: a = np.random.randint(0, 20, (10, 10))
In [44]: a
Out[44]:
array([[ 7, 11, 12, 0, 2, 3, 4, 10, 6, 10],
[16, 16, 4, 3, 18, 5, 10, 4, 14, 9],
[ 2, 9, 15, 12, 18, 3, 13, 11, 5, 10],
[14, 0, 9, 11, 1, 4, 9, 19, 18, 12],
[ 0, 10, 5, 15, 9, 18, 5, 2, 16, 19],
[14, 19, 3, 11, 13, 11, 13, 11, 1, 14],
[ 7, 15, 18, 6, 5, 13, 1, 7, 9, 19],
[11, 17, 11, 16, 14, 3, 16, 1, 12, 19],
[ 2, 4, 14, 8, 6, 9, 14, 9, 1, 5],
[ 1, 10, 15, 0, 1, 9, 18, 2, 2, 12]])
In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
[2, 7, 5, 9, 6, 8, 1, 0, 4],
[5, 8, 1, 9, 7, 3, 6, 2, 4],
[4, 5, 2, 6, 3, 9, 0, 8, 7],
[7, 2, 6, 4, 1, 3, 8, 5, 9],
[2, 3, 5, 7, 6, 4, 0, 9, 1],
[4, 3, 0, 7, 8, 5, 1, 2, 9],
[5, 2, 0, 8, 4, 6, 3, 1, 9],
[0, 1, 9, 4, 3, 7, 5, 2, 6],
[0, 4, 7, 8, 5, 1, 9, 2, 6]])
In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
[1, 0, 4],
[6, 2, 4],
[0, 8, 7],
[8, 5, 9],
[0, 9, 1],
[1, 2, 9],
[3, 1, 9],
[5, 2, 6],
[9, 2, 6]])
In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
[16, 16, 18],
[13, 15, 18],
[14, 18, 19],
[16, 18, 19],
[14, 14, 19],
[15, 18, 19],
[16, 17, 19],
[ 9, 14, 14],
[12, 15, 18]])
Use:
>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]
对于常规的Python列表:
>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]
如果使用Python 2,请使用xrange而不是range。
来源:堆队列算法
当top_k<<axis_length时,它优于argsort。
import numpy as np
def get_sorted_top_k(array, top_k=1, axis=-1, reverse=False):
if reverse:
axis_length = array.shape[axis]
partition_index = np.take(np.argpartition(array, kth=-top_k, axis=axis),
range(axis_length - top_k, axis_length), axis)
else:
partition_index = np.take(np.argpartition(array, kth=top_k, axis=axis), range(0, top_k), axis)
top_scores = np.take_along_axis(array, partition_index, axis)
# resort partition
sorted_index = np.argsort(top_scores, axis=axis)
if reverse:
sorted_index = np.flip(sorted_index, axis=axis)
top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
top_sorted_indexes = np.take_along_axis(partition_index, sorted_index, axis)
return top_sorted_scores, top_sorted_indexes
if __name__ == "__main__":
import time
from sklearn.metrics.pairwise import cosine_similarity
x = np.random.rand(10, 128)
y = np.random.rand(1000000, 128)
z = cosine_similarity(x, y)
start_time = time.time()
sorted_index_1 = get_sorted_top_k(z, top_k=3, axis=1, reverse=True)[1]
print(time.time() - start_time)
我认为最省时的方法是手动遍历数组并保持k-size的min-heap,正如其他人所提到的那样。
我还想出了一个蛮力方法:
top_k_index_list = [ ]
for i in range(k):
top_k_index_list.append(np.argmax(my_array))
my_array[top_k_index_list[-1]] = -float('inf')
在使用argmax获取其索引后,将最大的元素设置为一个较大的负值。然后argmax的下一次调用将返回第二大的元素。 您可以记录这些元素的原始值,并在需要时恢复它们。