在Python中获取迭代器中的元素个数

一般来说，有没有一种有效的方法可以知道Python中的迭代器中有多少个元素，而不用遍历每个元素并计数?

当前回答

在计算机上有两种方法来获取“某物”的长度。

第一种方法是存储一个计数——这需要任何接触文件/数据的东西来修改它(或者一个只公开接口的类——但归根结底是一样的)。

另一种方法是遍历它并计算它有多大。

2010-07-27 16:55:41

其他回答

这在理论上是不可能的:事实上，这就是“停止问题”。

证明

相反，假设可以使用函数len(g)来确定任何生成器g的长度(或无限长度)。

对于任何程序P，现在让我们将P转换为生成器g(P): 对于P中的每个返回点或出口点，产生一个值而不是返回它。

如果len(g(P)) ==无穷大，P不会停止。

这解决了暂停问题，这是不可能的，见维基百科。矛盾。

因此，如果不对泛型生成器进行迭代(==实际运行整个程序)，就不可能对其元素进行计数。

更具体地说，考虑

def g():
    while True:
        yield "more?"

长度是无限的。这样的发生器有无穷多个。

2022-01-16 16:10:03

关于你最初的问题，答案仍然是，在Python中通常没有办法知道迭代器的长度。

Given that you question is motivated by an application of the pysam library, I can give a more specific answer: I'm a contributer to PySAM and the definitive answer is that SAM/BAM files do not provide an exact count of aligned reads. Nor is this information easily available from a BAM index file. The best one can do is to estimate the approximate number of alignments by using the location of the file pointer after reading a number of alignments and extrapolating based on the total size of the file. This is enough to implement a progress bar, but not a method of counting alignments in constant time.

2010-08-17 18:57:51

假设，您希望在不遍历的情况下计算项的数量，这样迭代器就不会耗尽，稍后可以再次使用它。这是可能的复制或深度复制

import copy

def get_iter_len(iterator):
    return sum(1 for _ in copy.copy(iterator))

###############################################

iterator = range(0, 10)
print(get_iter_len(iterator))

if len(tuple(iterator)) > 1:
    print("Finding the length did not exhaust the iterator!")
else:
    print("oh no! it's all gone")

输出是“查找长度没有耗尽迭代器!”

可选的(并且不明智的)，你可以像下面这样为内置的len函数添加阴影:

import copy

def len(obj, *, len=len):
    try:
        if hasattr(obj, "__len__"):
            r = len(obj)
        elif hasattr(obj, "__next__"):
            r = sum(1 for _ in copy.copy(obj))
        else:
            r = len(obj)
    finally:
        pass
    return r

2019-10-29 17:38:46

虽然一般情况下不可能按照要求去做，但在迭代了多少项之后，对它们进行迭代的次数进行计数通常仍然是有用的。为此，您可以使用jaraco.itertools.Counter或类似的方法。下面是一个使用python3和rwt加载包的例子。

$ rwt -q jaraco.itertools -- -q
>>> import jaraco.itertools
>>> items = jaraco.itertools.Counter(range(100))
>>> _ = list(counted)
>>> items.count
100
>>> import random
>>> def gen(n):
...     for i in range(n):
...         if random.randint(0, 1) == 0:
...             yield i
... 
>>> items = jaraco.itertools.Counter(gen(100))
>>> _ = list(counted)
>>> items.count
48

2017-08-04 20:05:35

这段代码应该工作:

>>> iter = (i for i in range(50))
>>> sum(1 for _ in iter)
50

尽管它确实遍历每一项并计算它们，但这是最快的方法。

它也适用于迭代器中没有项的情况:

>>> sum(1 for _ in range(0))
0

当然，对于一个无限的输入，它会一直运行，所以请记住迭代器可以是无限的:

>>> sum(1 for _ in itertools.count())
[nothing happens, forever]

此外，请注意，这样做将耗尽迭代器，并且进一步尝试使用它将看不到任何元素。这是Python迭代器设计的一个不可避免的结果。如果你想保留元素，你就必须把它们存储在一个列表或其他东西中。

2010-07-27 16:35:35

在Python中获取迭代器中的元素个数

推荐文章

最新文章

标签