如何在整数列表中找到重复项并创建重复项的另一个列表?


当前回答

list2 = [1, 2, 3, 4, 1, 2, 3]
lset = set()
[(lset.add(item), list2.append(item))
 for item in list2 if item not in lset]
print list(lset)

其他回答

另一种解决方案如下所示,不使用任何集合库。

a = [1,2,3,5,4,6,4,21,4,6,3,32,5,2,23,5]
duplicates = []

for i in a:
    if a.count(i) > 1 and i not in duplicates:
        duplicates.append(i)

print(duplicates)

输出是[2,3,5,4,6]

这里有很多答案,但我认为这是一个相对易于阅读和理解的方法:

def get_duplicates(sorted_list):
    duplicates = []
    last = sorted_list[0]
    for x in sorted_list[1:]:
        if x == last:
            duplicates.append(x)
        last = x
    return set(duplicates)

注:

如果您希望保留重复计数,请去掉强制转换 'set'在底部获得完整的列表 如果您更喜欢使用生成器,请将duplicate .append(x)替换为yield x和底部的return语句(您可以稍后强制转换为set)

不需要转换为列表,可能最简单的方法是如下所示。 在面试中,当他们要求不要使用集合时,这可能会很有用

a=[1,2,3,3,3]
dup=[]
for each in a:
  if each not in dup:
    dup.append(each)
print(dup)

======= else获取唯一值和重复值的2个单独列表

a=[1,2,3,3,3]
uniques=[]
dups=[]

for each in a:
  if each not in uniques:
    uniques.append(each)
  else:
    dups.append(each)
print("Unique values are below:")
print(uniques)
print("Duplicate values are below:")
print(dups)

你不需要计数,只需要该物品之前是否被看到过。把这个答案用在这个问题上:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

以防速度很重要,这里有一些时间安排:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

以下是结果:(做得好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

有趣的是,除了计时本身,当使用pypy时,排名也略有变化。最有趣的是,基于counter的方法极大地受益于pypy的优化,而我建议的方法缓存方法似乎几乎没有任何效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

显然,这种效应与输入数据的“重复性”有关。我设置了l = [random.randrange(1000000) for I in xrange(10000)],得到了这些结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

使用Set函数 如:-

arr=[1,4,2,5,2,3,4,1,4,5,2,3]
arr2=list(set(arr))
print(arr2)

输出:- [1,2,3,4,5]

使用array删除副本

eg:-

arr=[1,4,2,5,2,3,4,1,4,5,2,3]
arr3=[]
for i in arr:
    if(i not in arr3):
     arr3.append(i)
print(arr3)

输出: [1,4,2,5,3]

使用Lambda函数

eg:-

rem_duplicate_func=lambda arr:set(arr)
print(rem_duplicate_func(arr))

输出: {1,2,3,4,5}

从字典中删除重复值

eg:-

dict1={
    'car':["Ford","Toyota","Ford","Toyota"],
    'brand':["Mustang","Ranz","Mustang","Ranz"] } dict2={} for key,value in dict1.items():
    dict2[key]=set(value) print(dict2)

输出: {“车”:{“丰田”、“福特”},“品牌”:{“主攻”、“野马”}}

对称差异-删除重复元素

eg:-

set1={1,2,4,5}
set2={2,1,5,7}
rem_dup_ele=set1.symmetric_difference(set2)
print(rem_dup_ele)

输出: {4 7}