我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
我如何有效地获得一个NumPy数组中每个唯一值的频率计数?
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> freq_count(x)
[(1, 5), (2, 3), (5, 1), (25, 1)]
当前回答
用这个:
>>> import numpy as np
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> np.array(np.unique(x, return_counts=True)).T
array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
最初的回答:
使用scipy.stats.itemfreq(警告:已弃用):
>>> from scipy.stats import itemfreq
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> itemfreq(x)
/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`
array([[ 1., 5.],
[ 2., 3.],
[ 5., 1.],
[ 25., 1.]])
其他回答
这是迄今为止最通用和性能最好的解决方案;很惊讶它还没有发布。
import numpy as np
def unique_count(a):
unique, inverse = np.unique(a, return_inverse=True)
count = np.zeros(len(unique), np.int)
np.add.at(count, inverse, 1)
return np.vstack(( unique, count)).T
print unique_count(np.random.randint(-10,10,100))
与目前接受的答案不同,它适用于任何可排序的数据类型(不仅仅是正整数),并且具有最佳性能;唯一重要的开销是np.unique所做的排序。
使用pandas模块:
>>> import pandas as pd
>>> import numpy as np
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> pd.value_counts(x)
1 5
2 3
25 1
5 1
dtype: int64
像这样的东西应该做到:
#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)
#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
d[j]+=1 #increment when that value is found
另外,之前的这篇关于有效计算独特元素的文章似乎与您的问题非常相似,除非我遗漏了什么。
为了计算唯一的非整数——类似于Eelco Hoogendoorn的答案,但速度要快得多(在我的机器上是5倍),我使用了weave。内联组合numpy。只有一点c代码;
import numpy as np
from scipy import weave
def count_unique(datain):
"""
Similar to numpy.unique function for returning unique members of
data, but also returns their counts
"""
data = np.sort(datain)
uniq = np.unique(data)
nums = np.zeros(uniq.shape, dtype='int')
code="""
int i,count,j;
j=0;
count=0;
for(i=1; i<Ndata[0]; i++){
count++;
if(data(i) > data(i-1)){
nums(j) = count;
count = 0;
j++;
}
}
// Handle last value
nums(j) = count+1;
"""
weave.inline(code,
['data', 'nums'],
extra_compile_args=['-O2'],
type_converters=weave.converters.blitz)
return uniq, nums
配置文件信息
> %timeit count_unique(data)
> 10000 loops, best of 3: 55.1 µs per loop
Eelco的纯numpy版本:
> %timeit unique_count(data)
> 1000 loops, best of 3: 284 µs per loop
Note
这里存在冗余(unique也执行排序),这意味着可以通过将唯一功能放入c-code循环中来进一步优化代码。
用这个:
>>> import numpy as np
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> np.array(np.unique(x, return_counts=True)).T
array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
最初的回答:
使用scipy.stats.itemfreq(警告:已弃用):
>>> from scipy.stats import itemfreq
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> itemfreq(x)
/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`
array([[ 1., 5.],
[ 2., 3.],
[ 5., 1.],
[ 25., 1.]])