如何将列表分成大小相等的块？

如何将任意长度的列表拆分为大小相等的块？

请参阅如果数据结果将直接用于循环，并且不需要存储，则如何以块形式遍历列表。

对于字符串输入的同一问题，请参见每n个字符拆分字符串？。相同的技术通常适用，但也有一些变化。

当前回答

还有一个解决方案

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>>

2017-04-17 15:38:56

其他回答

我意识到这个问题已经过时了（在谷歌上被它绊倒了），但肯定像下面这样的问题比任何复杂的建议都要简单和清晰得多，而且只使用切片：

def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...

2012-08-27 22:58:05

抽象将是

l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
    outList.append(l[i-n:i])

print(outList)

这将打印：

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

2020-06-29 17:54:13

下面我有一个解决方案确实有效，但比这个解决方案更重要的是对其他方法的一些评论。首先，一个好的解决方案不应该要求一个循环按顺序遍历子迭代器。如果我跑

g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)

最后一个命令的适当输出是

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

not

[]

正如这里大多数基于itertools的解决方案所返回的那样。这不仅仅是关于按顺序访问迭代器的常见无聊限制。想象一个消费者试图清理输入不良的数据，该数据颠倒了5的块的适当顺序，即数据看起来像[B5，A5，D5，C5]，应该像[A5，B5，C5，D5]（其中A5只是五个元素，而不是子列表）。该使用者将查看分组函数的声明行为，并毫不犹豫地编写一个类似

i = 0
out = []
for it in paged_iter(data,5)
    if (i % 2 == 0):
         swapped = it
    else: 
         out += list(it)
         out += list(swapped)
    i = i + 1

如果您偷偷摸摸地假设子迭代器总是按顺序完全使用，那么这将产生神秘的错误结果。如果你想交错块中的元素，情况就更糟了。

其次，大量建议的解决方案隐含地依赖于迭代器具有确定性顺序的事实（例如，迭代器没有设置），尽管使用islice的一些解决方案可能还可以，但我对此感到担忧。

第三，itertools-grouper方法有效，但该方法依赖于zip_langest（或zip）函数的内部行为，而这些行为不是其发布行为的一部分。特别是，grouper函数只起作用，因为在zip_langest（i0…In）中，下一个函数总是按next（i0）、next（i 1）、……的顺序调用。。。在重新开始之前。当grouper传递同一迭代器对象的n个副本时，它依赖于此行为。

最后，虽然下面的解决方案可以得到改进，但如果您对上面的假设进行了批评，即子迭代器是按顺序访问的，并且在没有这个假设的情况下被完全阅读，则必须隐式（通过调用链）或显式（通过deques或其他数据结构）为每个子迭代程序存储元素。所以，不要浪费时间（就像我所做的那样），假设人们可以用一些巧妙的技巧来解决这个问题。

def paged_iter(iterat, n):
    itr = iter(iterat)
    deq = None
    try:
        while(True):
            deq = collections.deque(maxlen=n)
            for q in range(n):
                deq.append(next(itr))
            yield (i for i in deq)
    except StopIteration:
        yield (i for i in deq)

2017-01-11 09:18:53

我在不创建temorary列表对象的情况下提出了以下解决方案，该对象可以与任何可迭代对象一起使用。请注意，此版本适用于Python 2.x：

def chunked(iterable, size):
    stop = []
    it = iter(iterable)
    def _next_chunk():
        try:
            for _ in xrange(size):
                yield next(it)
        except StopIteration:
            stop.append(True)
            return

    while not stop:
        yield _next_chunk()

for it in chunked(xrange(16), 4):
   print list(it)

输出：

[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15] 
[]

正如您所看到的，如果len（可迭代）%size==0，那么我们有额外的空迭代器对象。但我不认为这是个大问题。

2015-09-18 17:54:39

我很好奇不同方法的性能，这里是：

在Python 3.5.1上测试

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

结果：

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

2018-01-07 08:58:54

如何将列表分成大小相等的块？

推荐文章

最新文章

标签