如何在整数列表中找到重复项并创建重复项的另一个列表?
当前回答
如果你不关心自己编写算法或使用库,Python 3.8一行代码:
l = [1,2,3,2,1,5,6,5,5,5]
res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]
print(res)
打印项目和计数:
[(1, 2), (2, 2), (5, 4)]
groupby接受一个分组函数,因此您可以以不同的方式定义分组,并根据需要返回额外的Tuple字段。
其他回答
为了实现这个问题,我们可以使用多种不同的方法来解决它,这两种是常见的解决方案,但在实际场景中实现它们时,我们还必须考虑时间复杂性。
import random
import time
dupl_list = [random.randint(1,1000) for x in range(500)]
print("List with duplicate integers")
print (dupl_list)
#Method 1
print("******************Method 1 *************")
def Repeat_num(x):
_size = len(x)
repeated = []
for i in range(_size):
# print(i)
k = i + 1
for j in range(k, _size):
# print(j)
if x[i] == x[j] and x[i] not in repeated:
repeated.append(x[i])
return repeated
start = time.time()
print(Repeat_num(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")
print("***************Method 2****************")
#method 2 - using count()
def repeast_count(dup_list):
new = []
for a in dup_list:
# print(a)
# checking the occurrence of elements
n = dup_list.count(a)
# if the occurrence is more than
# one we add it to the output list
if n > 1:
if new.count(a) == 0: # condition to check
new.append(a)
return new
start = time.time()
print(repeast_count(dupl_list))
end = time.time()
print("The time of execution of above program is :",(end-start) * 10**3, "ms")
# #输出示例:
List with duplicate integers
[5, 45, 28, 81, 32, 98, 8, 83, 47, 95, 41, 49, 4, 1, 85, 26, 38, 82, 54, 11]
******************Method 1 *************
[]
The time of execution of above program is : 1.1069774627685547 ms
***************Method 2****************
[]
The time of execution of above program is : 0.1881122589111328 ms
对于一般的理解,方法1是好的,但是对于真正的实现,我更喜欢方法2,因为它比方法1花费的时间更少。
我必须这样做,因为我挑战自己不使用其他方法:
def dupList(oldlist):
if type(oldlist)==type((2,2)):
oldlist=[x for x in oldlist]
newList=[]
newList=newList+oldlist
oldlist=oldlist
forbidden=[]
checkPoint=0
for i in range(len(oldlist)):
#print 'start i', i
if i in forbidden:
continue
else:
for j in range(len(oldlist)):
#print 'start j', j
if j in forbidden:
continue
else:
#print 'after Else'
if i!=j:
#print 'i,j', i,j
#print oldlist
#print newList
if oldlist[j]==oldlist[i]:
#print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]
forbidden.append(j)
#print 'forbidden', forbidden
del newList[j-checkPoint]
#print newList
checkPoint=checkPoint+1
return newList
所以你的样本工作如下:
>>>a = [1,2,3,3,3,4,5,6,6,7]
>>>dupList(a)
[1, 2, 3, 4, 5, 6, 7]
你不需要计数,只需要该物品之前是否被看到过。把这个答案用在这个问题上:
def list_duplicates(seq):
seen = set()
seen_add = seen.add
# adds all elements it doesn't know yet to seen and all other to seen_twice
seen_twice = set( x for x in seq if x in seen or seen_add(x) )
# turn the set into a list (as requested)
return list( seen_twice )
a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]
以防速度很重要,这里有一些时间安排:
# file: test.py
import collections
def thg435(l):
return [x for x, y in collections.Counter(l).items() if y > 1]
def moooeeeep(l):
seen = set()
seen_add = seen.add
# adds all elements it doesn't know yet to seen and all other to seen_twice
seen_twice = set( x for x in l if x in seen or seen_add(x) )
# turn the set into a list (as requested)
return list( seen_twice )
def RiteshKumar(l):
return list(set([x for x in l if l.count(x) > 1]))
def JohnLaRooy(L):
seen = set()
seen2 = set()
seen_add = seen.add
seen2_add = seen2.add
for item in L:
if item in seen:
seen2_add(item)
else:
seen_add(item)
return list(seen2)
l = [1,2,3,2,1,5,6,5,5,5]*100
以下是结果:(做得好@JohnLaRooy!)
$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop
有趣的是,除了计时本身,当使用pypy时,排名也略有变化。最有趣的是,基于counter的方法极大地受益于pypy的优化,而我建议的方法缓存方法似乎几乎没有任何效果。
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop
显然,这种效应与输入数据的“重复性”有关。我设置了l = [random.randrange(1000000) for I in xrange(10000)],得到了这些结果:
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop
简单地检查,对于所有列表项,如果一个项的第一个索引等于该项的最后一个索引:
>>> lastindex = lambda arr, el: len(arr) - arr[::-1].index(el) -1
>>> is_duplicate = lambda arr, el: arr.index(el) != lastindex(arr, el)
>>> duplicates = lambda arr: [*set(x for x in arr if is_duplicate(arr, x))]
>>>
>>> a=[2,3,5,7,11,13, 2,17,7,7,17,18,3,19,5,2,7,48,48,2,19]
>>> duplicates(a)
[2, 3, 5, 7, 48, 17, 19]
>>>
通过检查出现的次数,简单地遍历列表中的每个元素,然后将它们添加到一个集,然后打印重复的元素。希望这能帮助到一些人。
myList = [2 ,4 , 6, 8, 4, 6, 12];
newList = set()
for i in myList:
if myList.count(i) >= 2:
newList.add(i)
print(list(newList))
## [4 , 6]
推荐文章
- 证书验证失败:无法获得本地颁发者证书
- 当使用pip3安装包时,“Python中的ssl模块不可用”
- 无法切换Python与pyenv
- Python if not == vs if !=
- 如何从scikit-learn决策树中提取决策规则?
- 为什么在Mac OS X v10.9 (Mavericks)的终端中apt-get功能不起作用?
- 将旋转的xtick标签与各自的xtick对齐
- 为什么元组可以包含可变项?
- 如何合并字典的字典?
- 如何创建类属性?
- 不区分大小写的“in”
- 在Python中获取迭代器中的元素个数
- 解析日期字符串并更改格式
- 使用try和。Python中的if
- 如何在Python中获得所有直接子目录