我如何找到一个列表中的重复，并与他们创建另一个列表?

如何在整数列表中找到重复项并创建重复项的另一个列表?

当前回答

下面是一个快速生成器，它使用dict将每个元素存储为一个带有布尔值的键，用于检查是否已经产生了重复项。

对于所有元素都是可哈希类型的列表:

def gen_dupes(array):
    unique = {}
    for value in array:
        if value in unique and unique[value]:
            unique[value] = False
            yield value
        else:
            unique[value] = True

array = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, 1, 6]

对于可能包含列表的列表:

def gen_dupes(array):
    unique = {}
    for value in array:
        is_list = False
        if type(value) is list:
            value = tuple(value)
            is_list = True

        if value in unique and unique[value]:
            unique[value] = False
            if is_list:
                value = list(value)

            yield value
        else:
            unique[value] = True

array = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, [1, 2], 6]

2016-05-24 01:55:40

其他回答

使用熊猫:

>>> import pandas as pd
>>> a = [1, 2, 1, 3, 3, 3, 0]
>>> pd.Series(a)[pd.Series(a).duplicated()].values
array([1, 3, 3])

2016-10-08 11:10:45

尽管它的复杂度是O(n log n)，但这似乎有点竞争性，请参阅下面的基准测试。

a = sorted(a)
dupes = list(set(a[::2]) & set(a[1::2]))

排序会把副本放在一起，所以它们都在偶数下标和奇数下标处。唯一值只能在偶数或奇数下标处存在，不能同时存在。所以偶数下标值和奇数下标值的交集就是重复项。

基准测试结果:

这使用了MSeifert的基准测试，但只使用了从接受的答案(georgs)、最慢的解决方案、最快的解决方案(不包括it_duplcopies，因为它不唯一重复)和我的解决方案。否则就太拥挤了，颜色也太相似了。

如果允许修改给定的列表，那么第一行可以是a.sort()，这样会快一些。但是基准会多次重用相同的列表，因此修改它会打乱基准。

显然set(a[::2]).intersection(a[1::2])不会创建第二个集合，而且速度会快一点，但它也会长一点。

2020-11-22 16:53:28

在Python中，只需一次迭代就可以找到被愚弄的人，这是一个非常简单快速的方法:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']

testListDict = {}

for item in testList:
  try:
    testListDict[item] += 1
  except:
    testListDict[item] = 1

print testListDict

输出内容如下:

>>> print testListDict
{'blue': 3, 'green': 1, 'red': 2}

这和更多在我的博客http://www.howtoprogramwithpython.com

2016-06-21 02:32:19

我们可以使用itertools。Groupby，以便找到所有有dup的项:

from itertools import groupby

myList  = [2, 4, 6, 8, 4, 6, 12]
# when the list is sorted, groupby groups by consecutive elements which are similar
for x, y in groupby(sorted(myList)):
    #  list(y) returns all the occurences of item x
    if len(list(y)) > 1:
        print x

输出将是:

4
6

2017-07-21 16:42:35

我没有看到一个纯粹使用迭代器的解决方案，所以我们开始吧

这需要对列表进行排序，这可能是这里的缺点。

a = [1,2,3,2,1,5,6,5,5,5]
a.sort()
set(map(lambda x: x[0], filter(lambda x: x[0] == x[1], zip(a, a[1:]))))

{1, 2, 5}

你可以用这段代码轻松检查你的机器有多快，有一百万潜在的重复:

首先生成数据

import random
from itertools import chain
a = list(chain(*[[n] * random.randint(1, 2) for n in range(1000000)]))

并运行测试:

set(map(lambda x: x[0], filter(lambda x: x[0] == x[1], zip(a, a[1:]))))

不用说，这个解决方案只在列表已经排序的情况下才有效。

2020-06-17 14:44:16

我如何找到一个列表中的重复，并与他们创建另一个列表?

推荐文章

最新文章

标签