我需要一个滚动窗口(又名滑动窗口)可迭代的序列/迭代器/生成器。(默认的Python迭代可以被认为是一种特殊情况,其中窗口长度为1。)我目前正在使用以下代码。我怎样才能做得更优雅和/或更有效?
def rolling_window(seq, window_size):
it = iter(seq)
win = [it.next() for cnt in xrange(window_size)] # First window
yield win
for e in it: # Subsequent windows
win[:-1] = win[1:]
win[-1] = e
yield win
if __name__=="__main__":
for w in rolling_window(xrange(6), 3):
print w
"""Example output:
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
"""
对于window_size == 2的特定情况(即,在序列中迭代相邻的重叠对),请参见如何从列表中迭代重叠(当前,下一个)值对?
多个迭代器!
def window(seq, size, step=1):
# initialize iterators
iters = [iter(seq) for i in range(size)]
# stagger iterators (without yielding)
[next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
while(True):
yield [next(i) for i in iters]
# next line does nothing for step = 1 (skips iterations for step > 1)
[next(i) for i in iters for j in range(step-1)]
next(it)在序列结束时引发StopIteration,出于一些我无法理解的很酷的原因,yield语句在这里除外它,函数返回,忽略没有形成完整窗口的剩余值。
无论如何,这是目前为止行数最少的解决方案,它唯一的要求是seq实现__iter__或__getitem__,并且除了@dansalmo的解决方案之外,不依赖于itertools或集合:)
这似乎是为collections.deque定制的,因为您实际上有一个FIFO(添加到一端,从另一端删除)。然而,即使你使用列表,你也不应该切片两次;相反,您应该只从列表中弹出(0)并追加()新项。
下面是一个基于deque的优化实现:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
在我的测试中,它在大多数时候都轻松击败了这里发布的其他所有东西,尽管pillmuncher的tee版本在大可迭代对象和小窗口方面击败了它。在较大的窗口上,deque再次以原始速度领先。
Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque's strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher's). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.
这里是一个泛化,增加了对step, fillvalue参数的支持:
from collections import deque
from itertools import islice
def sliding_window(iterable, size=2, step=1, fillvalue=None):
if size < 0 or step < 1:
raise ValueError
it = iter(iterable)
q = deque(islice(it, size), maxlen=size)
if not q:
return # empty iterable or size == 0
q.extend(fillvalue for _ in range(size - len(q))) # pad to size
while True:
yield iter(q) # iter() to avoid accidental outside modifications
try:
q.append(next(it))
except StopIteration: # Python 3.5 pep 479 support
return
q.extend(next(it, fillvalue) for _ in range(step - 1))
它每次产生块大小的项目,每次迭代滚动步骤位置,在必要时用fillvalue填充每个块。示例:size=4, step=3, fillvalue='*':
[a b c d]e f g h i j k l m n o p q r s t u v w x y z
a b c[d e f g]h i j k l m n o p q r s t u v w x y z
a b c d e f[g h i j]k l m n o p q r s t u v w x y z
a b c d e f g h i[j k l m]n o p q r s t u v w x y z
a b c d e f g h i j k l[m n o p]q r s t u v w x y z
a b c d e f g h i j k l m n o[p q r s]t u v w x y z
a b c d e f g h i j k l m n o p q r[s t u v]w x y z
a b c d e f g h i j k l m n o p q r s t u[v w x y]z
a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]
有关step参数的用例示例,请参见用python有效地处理大型.txt文件。
为什么不
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
它被记录在Python文档中。
您可以轻松地将其扩展到更宽的窗口。
deque窗口的一个轻微修改版本,使其成为一个真正的滚动窗口。因此,它开始只填充一个元素,然后增长到它的最大窗口大小,然后缩小,因为它的左边缘接近结束:
from collections import deque
def window(seq, n=2):
it = iter(seq)
win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
yield win
append = win.append
for e in it:
append(e)
yield win
for _ in xrange(len(win)-1):
win.popleft()
yield win
for wnd in window(range(5), n=3):
print(list(wnd))
这给了
[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]