我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
我有这样的代码:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。
我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?
当前回答
受到@gnibbler伟大(但简洁!)回答的启发,我们可以应用该方法映射到多个分区:
from collections import defaultdict
def splitter(l, mapper):
"""Split an iterable into multiple partitions generated by a callable mapper."""
results = defaultdict(list)
for x in l:
results[mapper(x)] += [x]
return results
然后可以使用splitter,如下所示:
>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0) # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]
这适用于有更复杂映射的两个以上分区(也适用于迭代器):
>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
(1, [2]),
(2, [3]),
(3, [4, 5, 6]),
(4, [7, 8, 9]),
(5, [10, 11, 12, 13, 14, 15]),
(6, [16, 17, 18, 19, 20, 21, 22])]
或者用字典来映射:
>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]
其他回答
我认为基于N个条件来划分一个可迭代对象是很方便的
from collections import OrderedDict
def partition(iterable,*conditions):
'''Returns a list with the elements that satisfy each of condition.
Conditions are assumed to be exclusive'''
d= OrderedDict((i,list())for i in range(len(conditions)))
for e in iterable:
for i,condition in enumerate(conditions):
if condition(e):
d[i].append(e)
break
return d.values()
例如:
ints,floats,other = partition([2, 3.14, 1, 1.69, [], None],
lambda x: isinstance(x, int),
lambda x: isinstance(x, float),
lambda x: True)
print " ints: {}\n floats:{}\n other:{}".format(ints,floats,other)
ints: [2, 1]
floats:[3.14, 1.69]
other:[[], None]
如果元素可以满足多个条件,则删除断点。
清晰快速
这个列表理解是简单的阅读和快速。这正是上级要求的。
set_good_vals = set(good_vals) # Speed boost.
good = [x for x in my_list if x in set_good_vals]
bad = [x for x in my_list if x not in set_good_vals]
我更喜欢一个列表理解而不是两个,但不像张贴的许多答案(其中一些相当巧妙),它是可读的和清晰的。这也是网页上最快的答案之一。
唯一(稍微)快一点的答案是:
set_good_vals = set(good_vals)
good, bad = [], []
for item in my_list:
_ = good.append(item) if item in set_good_vals else bad.append(item)
...还有它的变体。(见我的另一个答案)。但我觉得第一种方法更优雅,而且几乎一样快。
Good = [x for x in mylist if x in goodvals] Bad = [x for x in mylist if x not in goodvals] 我怎样才能做得更优雅呢?
代码已经非常优雅了。
使用集合可能会有轻微的性能改进,但差异是微不足道的。基于集合的方法也会丢弃重复项,并且不会保留元素的顺序。我发现列表理解也更容易阅读。
事实上,我们甚至可以更简单地使用for循环:
good, bad = [], []
for x in mylist:
if x in goodvals:
good.append(f)
else:
bad.append(f)
这种方法可以更容易地添加额外的逻辑。例如,代码很容易被修改为丢弃None值:
good, bad = [], []
for x in mylist:
if x is None:
continue
if x in goodvals:
good.append(f)
else:
bad.append(f)
之前的答案似乎并不能满足我所有的四种强迫症:
尽可能的懒惰, 只对原始Iterable求值一次 每个项只计算谓词一次 提供良好的类型注释(适用于python 3.7)
我的解决方案并不漂亮,我不认为我可以推荐使用它,但它是:
def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
deque_predicate_true = deque()
deque_predicate_false = deque()
# define a generator function to consume the input iterable
# the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded
def shared_generator(definitely_an_iterator):
for item in definitely_an_iterator:
print("Evaluate predicate.")
if predicate(item):
deque_predicate_true.appendleft(item)
yield True
else:
deque_predicate_false.appendleft(item)
yield False
# consume input iterable only once,
# converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
shared_gen = shared_generator(
iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
)
# define a generator function for each predicate outcome and queue
def iter_for(predicate_value, hold_queue):
def consume_shared_generator_until_hold_queue_contains_something():
if not hold_queue:
try:
while next(shared_gen) != predicate_value:
pass
except:
pass
consume_shared_generator_until_hold_queue_contains_something()
while hold_queue:
print("Yield where predicate is "+str(predicate_value))
yield hold_queue.pop()
consume_shared_generator_until_hold_queue_contains_something()
# return a tuple of two generators
return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)
用下面的测试,我们从print语句中得到如下输出:
t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]
good.append(x) if x in goodvals else bad.append(x)
来自@dansalmo的这个优雅简洁的回答被埋没在评论中,所以我只是把它作为一个答案转发到这里,这样它就能得到应有的重视,尤其是对新读者来说。
完整的例子:
good, bad = [], []
for x in my_list:
good.append(x) if x in goodvals else bad.append(x)