我需要写一个加权版的random。选择(列表中的每个元素有不同的被选择的概率)。这是我想到的:
def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.
choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored. The first item is the thing being chosen, the second item is
its weight. The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None
这个函数对我来说太复杂了,而且很丑。我希望这里的每个人都能提供一些改进的建议或其他方法。对我来说,效率没有代码的整洁和可读性重要。
我看了指向的其他线程,并在我的编码风格中提出了这种变化,这返回了用于计数的索引,但返回字符串很简单(注释返回替代):
import random
import bisect
try:
range = xrange
except:
pass
def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))][0]
# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]
tally = [0 for item in choices]
n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1
print([t/sum(tally)*100 for t in tally])
如果你没有提前定义你想要选择多少项(所以,你没有做k=10这样的事情),你只有概率,你可以做下面的事情。注意,你的概率加起来不需要等于1,它们可以相互独立:
soup_items = ['pepper', 'onion', 'tomato', 'celery']
items_probability = [0.2, 0.3, 0.9, 0.1]
selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']
如果你有一个加权字典而不是一个列表,你可以这样写
items = { "a": 10, "b": 5, "c": 1 }
random.choice([k for k in items for dummy in range(items[k])])
注意(k, k范围的虚拟物品(物品[k])]产生这个列表(' a ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' c ', ' b ', ' b ', ' b ', ' b ', ' b ']
我看了指向的其他线程,并在我的编码风格中提出了这种变化,这返回了用于计数的索引,但返回字符串很简单(注释返回替代):
import random
import bisect
try:
range = xrange
except:
pass
def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))][0]
# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]
tally = [0 for item in choices]
n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1
print([t/sum(tally)*100 for t in tally])
如果你碰巧有Python 3,并且害怕安装numpy或编写自己的循环,你可以这样做:
import itertools, bisect, random
def weighted_choice(choices):
weights = list(zip(*choices))[1]
return choices[bisect.bisect(list(itertools.accumulate(weights)),
random.uniform(0, sum(weights)))][0]
因为你可以用一袋管道适配器做任何东西!尽管……我必须承认,尼德的回答虽然稍长一些,但比较容易理解。
这取决于你想对分布进行多少次抽样。
假设要对分布进行K次抽样。当n是分布中的项数时,每次使用np.random.choice()的时间复杂度为O(K(n + log(n)))。
在我的例子中,我需要对相同的分布进行多次采样,阶数为10^3其中n阶数为10^6。我使用了下面的代码,它预先计算了累积分布,并在O(log(n))中对其进行采样。总体时间复杂度为O(n+K*log(n))。
import numpy as np
n,k = 10**6,10**3
# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)
cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]