如何从列表中删除重复项，同时保持顺序?

如何从列表中删除重复项，同时保持顺序?使用集合删除重复项会破坏原始顺序。是否有内置的或python的习语?

当前回答

MizardX的答案提供了多种方法的良好集合。

这是我自言自语时想到的:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

2011-10-09 14:16:00

其他回答

Pandas用户应该查看Pandas .unique。

>>> import pandas as pd
>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> pd.unique(lst)
array([1, 2, 3, 4])

该函数返回一个NumPy数组。如果需要，可以使用tolist方法将其转换为列表。

2020-05-26 09:30:14

最佳解决方案因Python版本和环境限制而异:

Python 3.7+(大多数解释器支持3.6，作为实现细节):

在PyPy 2.5.0中首次引入，并在CPython 3.6中作为实现细节采用，在Python 3.7中成为语言保证之前，plain dict是插入顺序的，甚至比collections.OrderedDict(也是在CPython 3.5中实现的C)更有效。到目前为止，最快的解决方案也是最简单的:

>>> items = [1, 2, 0, 1, 3, 2]
>>> list(dict.fromkeys(items))  # Or [*dict.fromkeys(items)] if you prefer
[1, 2, 0, 3]

像list(set(items))一样，这将所有工作推到C层(在CPython上)，但由于dicts是插入顺序的，dict.fromkeys不会失去顺序。它比list(set(items))慢(通常需要50-100%的时间)，但比任何其他保持顺序的解决方案快得多(在listcomp中使用set需要大约一半的时间)。

重要提示:more_itertools的unique_everseen解决方案(见下面)在惰性和支持非哈希输入项方面有一些独特的优势;如果您需要这些特性，那么这是唯一可行的解决方案。

Python 3.5(以及性能不重要的所有旧版本)

正如Raymond指出的，在CPython 3.5中，OrderedDict是用C实现的，丑陋的列表理解比OrderedDict.fromkeys要慢(除非你真的需要在结尾使用列表——即使这样，也只有在输入非常短的情况下)。因此，在性能和可读性上，CPython 3.5的最佳解决方案是OrderedDict等价于3.6+使用普通dict:

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

在CPython 3.4及更早版本上，这将比其他一些解决方案慢，所以如果分析显示您需要更好的解决方案，请继续阅读。

Python 3.4或更早版本，如果性能很关键且第三方模块是可接受的

正如@abarnert指出的那样，more_itertools库(pip install more_itertools)包含一个unique_everseen函数，该函数是为了解决这个问题而构建的，而不会在列表推导中出现任何不可读(not seen.add)的变化。这也是最快的解决方案:

>>> from more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

只是一个简单的库导入，没有黑客。

该模块正在调整itertools配方unique_everseen，它看起来像:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

但与itertools食谱不同的是，它支持不可哈希项(以性能为代价;如果iterable中的所有元素都是不可哈希的，则算法变成O(n²)vs. O(n)(如果它们都是可哈希的)。

重要提示:与这里所有其他解决方案不同，unique_everseen可以惰性使用;峰值内存使用将是相同的(最终，底层集合增长到相同的大小)，但如果不列出结果，只需迭代它，就能够在找到唯一项时处理它们，而不是等到整个输入重复数据删除后才处理第一个唯一项。

Python 3.4及更早版本，如果性能非常关键且第三方模块不可用

你有两个选择:

复制并粘贴unique_everseen配方到您的代码中，并在上面的more_itertools示例中使用它使用丑陋的hack，允许一个listcomp检查和更新一个集，以跟踪所看到的内容: Seen = set() [x for x in seq if x not in seen and not seen.add(x)] 以依赖丑陋的黑客为代价: 不是seen.add (x) 这取决于一个事实。add是一个原地方法，总是返回None，所以not None的值为True。

Note that all of the solutions above are O(n) (save calling unique_everseen on an iterable of non-hashable items, which is O(n²), while the others would fail immediately with a TypeError), so all solutions are performant enough when they're not the hottest code path. Which one to use depends on which versions of the language spec/interpreter/third-party modules you can rely on, whether or not performance is critical (don't assume it is; it usually isn't), and most importantly, readability (because if the person who maintains this code later ends up in a murderous mood, your clever micro-optimization probably wasn't worth it).

2013-06-10 02:47:13

就地方法

这个方法是二次的，因为我们对列表中的每个元素都有一个线性查找(由于del，我们必须加上重新排列列表的代价)。

也就是说，如果我们从列表的末尾开始，并向原点前进，删除出现在其左侧子列表中的每一项，就有可能在原地操作

这个想法在代码中很简单

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i]

实现的简单测试

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:

2018-11-04 17:30:53

x = [1, 2, 1, 3, 1, 4]

# brute force method
arr = []
for i in x:
  if not i in arr:
    arr.insert(x[i],i)

# recursive method
tmp = []
def remove_duplicates(j=0):
    if j < len(x):
      if not x[j] in tmp:
        tmp.append(x[j])
      i = j+1  
      remove_duplicates(i)

      

remove_duplicates()

2020-08-23 00:55:38

对于另一个非常古老的问题的一个非常晚的回答:

itertools食谱有一个函数可以做到这一点，使用了见集技术，但是:

处理标准键函数。不使用不体面的黑客。通过预绑定优化循环。加，而不是查N次。(f7也这样做，但有些版本没有。) 通过使用ifilterfalse优化循环，因此只需遍历Python中唯一的元素，而不是所有元素。(当然，您仍然在ifilterfalse中遍历所有它们，但这是在C中，而且要快得多。)

Is it actually faster than f7? It depends on your data, so you'll have to test it and see. If you want a list in the end, f7 uses a listcomp, and there's no way to do that here. (You can directly append instead of yielding, or you can feed the generator into the list function, but neither one can be as fast as the LIST_APPEND inside a listcomp.) At any rate, usually, squeezing out a few microseconds is not going to be as important as having an easily-understandable, reusable, already-written function that doesn't require DSU when you want to decorate.

和所有的食谱一样，它也有更多的版本。

如果你只想要无键的情况，你可以简化为:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

2013-10-09 18:27:09

如何从列表中删除重复项，同时保持顺序?

推荐文章

最新文章

标签