我需要一个滚动窗口(又名滑动窗口)可迭代的序列/迭代器/生成器。(默认的Python迭代可以被认为是一种特殊情况,其中窗口长度为1。)我目前正在使用以下代码。我怎样才能做得更优雅和/或更有效?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:   
   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

对于window_size == 2的特定情况(即,在序列中迭代相邻的重叠对),请参见如何从列表中迭代重叠(当前,下一个)值对?


当前回答

另一种从列表生成固定长度窗口的简单方法

from collections import deque

def window(ls,window_size=3):
    window = deque(maxlen=window_size)

    for element in ls:
        
        if len(window)==window_size:
            yield list(window)
        window.append(element)

ls = [0,1,2,3,4,5]

for w in window(ls):
    print(w)

其他回答

在Python 3.10中,我们有itertools.pairwise(iterable)函数来滑动包含两个元素的窗口:

医生是这样说的:

返回从输入可迭代对象中获取的连续重叠对。 输出迭代器中2元组的数量将比输入的数量少1。如果输入可迭代对象的值少于两个,则返回空值。 大致相当于: def成对(iterable): # pairwise('ABCDEFG')——> AB BC CD DE EF FG A, b = tee(可迭代) 下一(b,没有) 返回zip(a, b)

这似乎是为collections.deque定制的,因为您实际上有一个FIFO(添加到一端,从另一端删除)。然而,即使你使用列表,你也不应该切片两次;相反,您应该只从列表中弹出(0)并追加()新项。

下面是一个基于deque的优化实现:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

在我的测试中,它在大多数时候都轻松击败了这里发布的其他所有东西,尽管pillmuncher的tee版本在大可迭代对象和小窗口方面击败了它。在较大的窗口上,deque再次以原始速度领先。

Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque's strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher's). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.

为什么不

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

它被记录在Python文档中。 您可以轻松地将其扩展到更宽的窗口。

>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]

我使用下面的代码作为一个简单的滑动窗口,它使用生成器来大幅提高可读性。根据我的经验,到目前为止,它的速度足以用于生物信息学序列分析。

我把它包括在这里是因为我还没有看到这种方法被使用过。同样,我对它的比较性能没有任何评价。

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]