如何从列表中删除重复项，同时保持顺序?

如何从列表中删除重复项，同时保持顺序?使用集合删除重复项会破坏原始顺序。是否有内置的或python的习语?

当前回答

如果你经常使用pandas，并且美学优先于性能，那么考虑内置函数pandas. series .drop_duplicate:

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

时间:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

2015-07-18 00:10:51

其他回答

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

unique→[1、(2)、(3)、(6)、(4)、(5)]

2013-04-13 17:32:19

对于不可哈希类型(例如列表的列表)，基于MizardX的:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

2011-08-21 20:04:12

如果你需要一个班轮，那么这可能会有帮助:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

．.．应该工作，但纠正我，如果我错了

2011-08-05 17:06:25

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

这个列表甚至不需要排序，充分条件是相等的值被分组在一起。

编辑:我假设“保持顺序”意味着列表实际上是有序的。如果不是这样，那么MizardX的解决方案是正确的。

社区编辑:然而，这是“将重复的连续元素压缩为单个元素”的最优雅的方法。

2009-01-26 15:47:14

1. 这些解决方案很好…… 为了在保留秩序的同时删除重复项，本页其他地方提出了优秀的解决方案:

seen = set()
[x for x in seq if not (x in seen or seen.add(x))]

以及变化，例如:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

确实很受欢迎，因为它们简单、极简，并部署了正确的哈希以获得最佳效率。关于这些方法的主要抱怨似乎是，将方法see .add(x)“返回”的不变量None用作逻辑表达式中的常量(因此是多余的/不必要的)值(只是为了它的副作用)是笨拙和/或令人困惑的。

2. …but they waste one hash lookup per iteration. Surprisingly, given the amount of discussion and debate on this topic, there is actually a significant improvement to the code that seems to have been overlooked. As shown, each "test-and-set" iteration requires two hash lookups: the first to test membership x not in seen and then again to actually add the value seen.add(x). Since the first operation guarantees that the second will always be successful, there is a wasteful duplication of effort here. And because the overall technique here is so efficient, the excess hash lookups will likely end up being the most expensive proportion of what little work remains.

3.相反，让布景完成它的工作吧! 注意，上面的例子只调用set。加上预见，这样做总是会导致集合成员的增加。集合本身永远没有机会拒绝副本;我们的代码片段实际上已经篡夺了这个角色。使用显式的两步测试和设置代码剥夺了set自身排除这些重复的核心能力。

4. 单哈希查找代码: 下面的版本将每次迭代的哈希查找次数减少了一半，从两次减少到只有一次。

seen = set()
[x for x in seq if len(seen) < len(seen.add(x) or seen)]

2021-07-08 20:31:29

如何从列表中删除重复项，同时保持顺序?

推荐文章

最新文章

标签