假设我有两个表,l1和l2。我想执行l1 - l2,返回l1中不在l2中的所有元素。
我可以想出一个简单的循环方法来做这个,但那真的很低效。python式的高效方法是什么?
举个例子,如果l1 = [1,2,6,8], l2 = [2,3,5,8], l1 - l2应该返回[1,6]
假设我有两个表,l1和l2。我想执行l1 - l2,返回l1中不在l2中的所有元素。
我可以想出一个简单的循环方法来做这个,但那真的很低效。python式的高效方法是什么?
举个例子,如果l1 = [1,2,6,8], l2 = [2,3,5,8], l1 - l2应该返回[1,6]
当前回答
使用Set推导式{x for x in l2}或Set (l2)来获取Set,然后使用List推导式来获取List
l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]
基准测试代码:
import time
l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))
l2set = {x for x in l2}
tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)
tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)
print("speedup %fx"%(difflist/diffset))
基准测试结果:
0.0015058517456054688
3.968189239501953
speedup 2635.179227x
其他回答
扩展Donut的答案和这里的其他答案,通过使用生成器推导式而不是列表推导式,以及使用集合数据结构(因为in操作符在列表上是O(n),而在集合上是O(1)),您可以得到更好的结果。
这里有一个函数适合你:
def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)
结果将是一个可迭代对象,它将惰性地获取过滤后的列表。如果你需要一个真正的列表对象(例如,如果你需要对结果执行len()),那么你可以很容易地像这样构建一个列表:
filtered_list = list(filter_list(full_list, excludes))
使用Set推导式{x for x in l2}或Set (l2)来获取Set,然后使用List推导式来获取List
l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]
基准测试代码:
import time
l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))
l2set = {x for x in l2}
tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)
tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)
print("speedup %fx"%(difflist/diffset))
基准测试结果:
0.0015058517456054688
3.968189239501953
speedup 2635.179227x
试试这个:
l1=[1,2,6,8]
l2=[2,3,5,8]
r=[]
for x in l1:
if x in l2:
continue
r=r+[x]
print(r)
一种方法是使用集合:
>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])
但是请注意,集合不会保留元素的顺序,并且会删除任何重复的元素。元素也需要是可哈希的。如果这些限制是可以容忍的,那么这通常是最简单和性能最高的选项。
使用filterfalse而不使用lambda-expression
When using functions like filter or filterfalse and similar from itertools you can usually save performance by avoiding lambda-expressions and using already existing functions. Instances of list and set defines a __contains__-method to use for containment checks. The in-operator calls this method under the hood, so using x in l2 can be replaced by l2.__contains__(x). Usually this replacement is not really prettier but in this specific case it allows us to gain better performance than using a lambda-expression, when used in combination with filterfalse:
>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = [2, 3, 5, 8]
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]
Filterfalse创建一个迭代器,该迭代器生成的所有元素在用作12.2 .__contains__的参数时返回false。
Sets有一个更快的__contains__实现,所以更好的是:
>>> from itertools import filterfalse
>>> l1 = [1, 2, 6, 8]
>>> l2 = set([2, 3, 5, 8])
>>> list(filterfalse(l2.__contains__, l1))
[1, 6]
性能
使用列表:
$ python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
500000 loops, best of 5: 522 nsec per loop
使用设置:
$ python3 -m timeit -s "from itertools import filterfalse; l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filterfalse(l2.__contains__, l1))"
1000000 loops, best of 5: 359 nsec per loop