我有这样的代码:

good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]

目标是根据mylist的内容是否满足条件,将它们拆分为另外两个列表。

我怎样才能做得更优雅呢?我能避免在mylist上做两个单独的迭代吗?我可以通过这样做来提高性能吗?


当前回答

第一步(pre-OP-edit):使用集合:

mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]

myset = set(mylist)
goodset = set(goodvals)

print list(myset.intersection(goodset))  # [1, 3, 7]
print list(myset.difference(goodset))    # [2, 4, 5, 6]

这对可读性(IMHO)和性能都有好处。

第二步(post-OP-edit):

创建一个好的扩展列表:

IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])

这将提高性能。否则,你现在的情况在我看来还不错。

其他回答

下面是惰性迭代器方法:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

它对每个项计算一次条件,并返回两个生成器,第一个生成条件为真时序列中的值,另一个生成条件为假时序列中的值。

因为它是惰性的,你可以在任何迭代器上使用它,甚至是无限迭代器:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

通常情况下,非惰性列表返回方法会更好:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

编辑:对于您更具体的用例,将项目按某些键分割到不同的列表中,这里有一个通用函数:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

用法:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']

所有提出的解决方案的问题是,它将扫描和应用过滤功能两次。我会做一个简单的小函数,像这样:

def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b

这样你就不会重复处理任何东西,也不会重复代码。

解决方案

from itertools import tee

def unpack_args(fn):
    return lambda t: fn(*t)

def separate(fn, lx):
    return map(
        unpack_args(
            lambda i, ly: filter(
                lambda el: bool(i) == fn(el),
                ly)),
        enumerate(tee(lx, 2)))

test

[even, odd] = separate(
    lambda x: bool(x % 2),
    [1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])

这个问题已经有很多答案了,但似乎都不如我最喜欢的解决这个问题的方法,这种方法只遍历和测试每个项目一次,并使用列表理解的速度来构建两个输出列表之一,因此它只需要使用相对较慢的附加来构建一个输出列表:

bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]

In my answer to a similar question, I explain how this approach works (a combination of Python's greedy evaluation of or refraining from executing the append for "good" items, and append returning a false-like value which leaves the if condition false for "bad" items), and I show timeit results indicating that this approach outcompetes alternatives like those suggested here, especially in cases where the majority of items will go into the list built by list-comprehension (in this case, the good list).

简单的生成器版本,在内存中保存尽可能少的值,并且只调用pred一次:

from collections import deque
from typing import Callable, TypeVar, Iterable
_T = TypeVar('_T')

def iter_split(pred: Callable[[_T], bool],
               iterable: Iterable[_T]) -> tuple[Iterable[_T], Iterable[_T]]:
    """Split an iterable into two iterables based on a predicate.
    
    The predicate will only be called once per element.
    
    Returns:
        A tuple of two iterables, the first containing all elements for which
        the predicate returned True, the second containing all elements for
        which the predicate returned False.
    """
    iterator = iter(iterable)
    true_values: deque[_T] = deque()
    false_values: deque[_T] = deque()
    
    def true_generator():
        while True:
            while true_values:
                yield true_values.popleft()
            
            for item in iterator:
                if pred(item):
                    yield item
                    break
                false_values.append(item)
            else:
                break
            
    def false_generator():
        while True:
            while false_values:
                yield false_values.popleft()
            
            for item in iterator:
                if not pred(item):
                    yield item
                    break
                true_values.append(item)
            else:
                break

    return true_generator(), false_generator()