如何从列表中删除重复项,同时保持顺序?使用集合删除重复项会破坏原始顺序。 是否有内置的或python的习语?
当前回答
这里有一个简单的方法:
list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=list1.index)
输出如下:
["hello", " ", "w", "o", "r", "l", "d"]
其他回答
l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))
一个生成器表达式,它使用集合的O(1)查找来确定是否在新列表中包含元素。
如果你经常使用pandas,并且美学优先于性能,那么考虑内置函数pandas. series .drop_duplicate:
import pandas as pd
import numpy as np
uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()
# from the chosen answer
def f7(seq):
seen = set()
seen_add = seen.add
return [ x for x in seq if not (x in seen or seen_add(x))]
alist = np.random.randint(low=0, high=1000, size=10000).tolist()
print uniquifier(alist) == f7(alist) # True
时间:
In [104]: %timeit f7(alist)
1000 loops, best of 3: 1.3 ms per loop
In [110]: %timeit uniquifier(alist)
100 loops, best of 3: 4.39 ms per loop
就地方法
这个方法是二次的,因为我们对列表中的每个元素都有一个线性查找(由于del,我们必须加上重新排列列表的代价)。
也就是说,如果我们从列表的末尾开始,并向原点前进,删除出现在其左侧子列表中的每一项,就有可能在原地操作
这个想法在代码中很简单
for i in range(len(l)-1,0,-1):
if l[i] in l[:i]: del l[i]
实现的简单测试
In [91]: from random import randint, seed
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics
In [93]: for i in range(len(l)-1,0,-1):
...: print(l)
...: print(i, l[i], l[:i], end='')
...: if l[i] in l[:i]:
...: print( ': remove', l[i])
...: del l[i]
...: else:
...: print()
...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]
In [94]:
1. 这些解决方案很好…… 为了在保留秩序的同时删除重复项,本页其他地方提出了优秀的解决方案:
seen = set()
[x for x in seq if not (x in seen or seen.add(x))]
以及变化,例如:
seen = set()
[x for x in seq if x not in seen and not seen.add(x)]
确实很受欢迎,因为它们简单、极简,并部署了正确的哈希以获得最佳效率。关于这些方法的主要抱怨似乎是,将方法see .add(x)“返回”的不变量None用作逻辑表达式中的常量(因此是多余的/不必要的)值(只是为了它的副作用)是笨拙和/或令人困惑的。
2. …but they waste one hash lookup per iteration. Surprisingly, given the amount of discussion and debate on this topic, there is actually a significant improvement to the code that seems to have been overlooked. As shown, each "test-and-set" iteration requires two hash lookups: the first to test membership x not in seen and then again to actually add the value seen.add(x). Since the first operation guarantees that the second will always be successful, there is a wasteful duplication of effort here. And because the overall technique here is so efficient, the excess hash lookups will likely end up being the most expensive proportion of what little work remains.
3.相反,让布景完成它的工作吧! 注意,上面的例子只调用set。加上预见,这样做总是会导致集合成员的增加。集合本身永远没有机会拒绝副本;我们的代码片段实际上已经篡夺了这个角色。使用显式的两步测试和设置代码剥夺了set自身排除这些重复的核心能力。
4. 单哈希查找代码: 下面的版本将每次迭代的哈希查找次数减少了一半,从两次减少到只有一次。
seen = set()
[x for x in seq if len(seen) < len(seen.add(x) or seen)]
不使用导入模块或集的解决方案:
text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)
给输出:
['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']
推荐文章
- 证书验证失败:无法获得本地颁发者证书
- 当使用pip3安装包时,“Python中的ssl模块不可用”
- 无法切换Python与pyenv
- Python if not == vs if !=
- 如何从scikit-learn决策树中提取决策规则?
- 为什么在Mac OS X v10.9 (Mavericks)的终端中apt-get功能不起作用?
- 将旋转的xtick标签与各自的xtick对齐
- 为什么元组可以包含可变项?
- 如何合并字典的字典?
- 如何创建类属性?
- 不区分大小写的“in”
- 在Python中获取迭代器中的元素个数
- 解析日期字符串并更改格式
- 使用try和。Python中的if
- 如何在Python中获得所有直接子目录