如何在整数列表中找到重复项并创建重复项的另一个列表?


当前回答

为了实现这个问题,我们可以使用多种不同的方法来解决它,这两种是常见的解决方案,但在实际场景中实现它们时,我们还必须考虑时间复杂性。

import random
import time

dupl_list = [random.randint(1,1000) for x in range(500)]
print("List with duplicate integers")
print (dupl_list)


#Method 1 
print("******************Method 1 *************")

def Repeat_num(x):
    _size = len(x)
    repeated = []
    for i in range(_size):
        # print(i)
        k = i + 1
        for j in range(k, _size):
            # print(j)
            if x[i] == x[j] and x[i] not in repeated:
                repeated.append(x[i])
    return repeated

start = time.time()
print(Repeat_num(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")

print("***************Method 2****************")

#method 2 - using count()
def repeast_count(dup_list):
  new = []
  for a in dup_list:
      # print(a)
      # checking the occurrence of elements
      n = dup_list.count(a)
      # if the occurrence is more than
      # one we add it to the output list
      if n > 1:
          if new.count(a) == 0:  # condition to check
              new.append(a)
  return new


start = time.time()
print(repeast_count(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")

# #输出示例:

List with duplicate integers
[5, 45, 28, 81, 32, 98, 8, 83, 47, 95, 41, 49, 4, 1, 85, 26, 38, 82, 54, 11]
******************Method 1 *************
[]
The time of execution of above program is : 1.1069774627685547 ms
***************Method 2****************
[]
The time of execution of above program is : 0.1881122589111328 ms

对于一般的理解,方法1是好的,但是对于真正的实现,我更喜欢方法2,因为它比方法1花费的时间更少。

其他回答

在列表中使用list.count()方法查找给定列表的重复元素

arr=[]
dup =[]
for i in range(int(input("Enter range of list: "))):
    arr.append(int(input("Enter Element in a list: ")))
for i in arr:
    if arr.count(i)>1 and i not in dup:
        dup.append(i)
print(dup)

集合。Counter是python 2.7中的新功能:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>> 

在早期版本中,你可以使用传统的字典:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]

要删除重复项,请使用集合(a)。要打印副本,可以这样做:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

请注意Counter并不是特别有效(计时),可能会在这里过度使用。Set会表现得更好。这段代码以源顺序计算一个唯一元素的列表:

seen = set()
uniq = []
for x in a:
    if x not in seen:
        uniq.append(x)
        seen.add(x)

或者,更简洁地说:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]    

我不推荐后一种风格,因为它不清楚not seen.add(x)在做什么(set add()方法总是返回None,因此需要not)。

计算没有库的重复元素列表:

seen = set()
dupes = []

for x in a:
    if x in seen:
        dupes.append(x)
    else:
        seen.add(x)

或者,更简洁地说:

seen = set()
dupes = [x for x in a if x in seen or seen.add(x)]    

如果列表元素不可哈希,则不能使用set /dicts,必须使用二次时间解决方案(逐个比较)。例如:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

有点晚了,但可能对一些人有帮助。 对于一个比较大的列表,我发现这个方法很适合我。

l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
    if x in s:
        s.remove(x)
    else:
        d.append(x)
d
[1,3,1]

显示正确和所有重复,并保持秩序。

还有其他测试。当然要做……

set([x for x in l if l.count(x) > 1])

...代价太大了。使用下一个final方法大约快500倍(数组越长结果越好):

def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()

只有2个循环,没有非常昂贵的l.count()操作。

下面是一个比较方法的代码。代码如下,输出如下:

dups_count: 13.368s # this is a function which uses l.count()
dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
dups_count_counter: 0.024s # collections.Counter

测试代码:

import numpy as np
from time import time
from collections import Counter

class TimerCounter(object):
    def __init__(self):
        self._time_sum = 0

    def start(self):
        self.time = time()

    def stop(self):
        self._time_sum += time() - self.time

    def get_time_sum(self):
        return self._time_sum


def dups_count(l):
    return set([x for x in l if l.count(x) > 1])


def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()


def dups_counter(l):
    counter = Counter(l)    

    result_d = {key: val for key, val in counter.iteritems() if val > 1}

    return result_d.keys()



def gen_array():
    np.random.seed(17)
    return list(np.random.randint(0, 5000, 10000))


def assert_equal_results(*results):
    primary_result = results[0]
    other_results = results[1:]

    for other_result in other_results:
        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)


if __name__ == '__main__':
    dups_count_time = TimerCounter()
    dups_count_dict_time = TimerCounter()
    dups_count_counter = TimerCounter()

    l = gen_array()

    for i in range(3):
        dups_count_time.start()
        result1 = dups_count(l)
        dups_count_time.stop()

        dups_count_dict_time.start()
        result2 = dups_count_dict(l)
        dups_count_dict_time.stop()

        dups_count_counter.start()
        result3 = dups_counter(l)
        dups_count_counter.stop()

        assert_equal_results(result1, result2, result3)

    print 'dups_count: %.3f' % dups_count_time.get_time_sum()
    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()