Python的切片表示法是如何工作的?也就是说:当我编写[x:y:z]、a[:]、a]::2]等代码时,我如何理解哪些元素最终会出现在切片中?请在适当的地方附上参考资料。


另请参见:为什么切片和范围上限是互斥的?


当前回答

关于序列的索引,需要记住的重要思想是

非负指数从序列中的第一项开始;负索引从序列的最后一项开始(因此仅适用于有限序列)。

换言之,负索引右移序列长度:

              0   1   2   3   4   5   6   7   ...
            -------------------------
            | a | b | c | d | e | f |
            -------------------------
...  -8  -7  -6  -5  -4  -3  -2  -1

考虑到这一点,订阅和切片很简单。

订阅

订阅使用以下语法:*

sequence[index]

订阅在索引处选择序列中的单个项目:

>>> 'abcdef'[0]
'a'
>>> 'abcdef'[-6]
'a'

订阅引发超出范围的索引的IndexError:

>>> 'abcdef'[100]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

切片

切片使用以下语法:**

sequence[start:stop:step]

切片选择序列中的一系列项目,从包含开始到不包含结束:

>>> 'abcdef'[0:2:1]
'ab'
>>> 'abcdef'[0:-4:1]
'ab'
>>> 'abcdef'[-6:-4:1]
'ab'
>>> 'abcdef'[-6:2:1]
'ab'
>>> 'abcdef'[1:-7:-1]
'ba'
>>> 'abcdef'[-5:-7:-1]
'ba'

切片默认为序列中最完整的项目范围,因此它对省略或等于“无”的任何开始、停止或步骤使用以下默认值:***

步骤默认为1;如果步骤为正start默认为0(第一项索引),stop默认为start+len(序列)(最后一项索引加一);如果步骤为负start默认为-1(最后一项索引),stop默认为start-len(序列)(第一项索引减一)。

>>> 'abcdef'[0:6:1]
'abcdef'
>>> 'abcdef'[::]
'abcdef'
>>> 'abcdef'[-1:-7:-1]
'fedcba'
>>> 'abcdef'[::-1]
'fedcba'

切片为等于零的步骤引发ValueError:

>>> 'abcdef'[::0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: slice step cannot be zero

切片不会为超出范围的开始或停止引发IndexError(与订阅相反):

>>> 'abcdef'[-100:100]
'abcdef'

*表达式sequence[index]和sequence__getitem_(索引)是等效的。

**表达式sequence[start:stop:step]、sequence[Sslice(start,stop,step)]和sequence__getitem_(slice(start,stop,step))是等效的,其中内置类slice实例包start,stop和step。

***表达式sequence[:]、sequence[::]和sequence[None:None:None]使用start、stop和step的默认值。

其他回答

我有点沮丧,因为找不到一个准确描述切片功能的在线源代码或Python文档。

我接受了Aaron Hall的建议,阅读了CPython源代码的相关部分,并编写了一些Python代码,这些代码执行切片与CPython中的切片类似。我已经用Python 3对整数列表进行了数百万次随机测试。

您可能会发现我的代码中对CPython中相关函数的引用很有用。

def slicer(x, start=None, stop=None, step=None):
    """ Return the result of slicing list x.  

    See the part of list_subscript() in listobject.c that pertains 
    to when the indexing item is a PySliceObject.
    """

    # Handle slicing index values of None, and a step value of 0.
    # See PySlice_Unpack() in sliceobject.c, which
    # extracts start, stop, step from a PySliceObject.
    maxint = 10000000       # A hack to simulate PY_SSIZE_T_MAX
    if step is None:
        step = 1
    elif step == 0:
        raise ValueError('slice step cannot be zero')

    if start is None:
        start = maxint if step < 0 else 0
    if stop is None:
        stop = -maxint if step < 0 else maxint

    # Handle negative slice indexes and bad slice indexes.
    # Compute number of elements in the slice as slice_length.
    # See PySlice_AdjustIndices() in sliceobject.c
    length = len(x)
    slice_length = 0

    if start < 0:
        start += length
        if start < 0:
            start = -1 if step < 0 else 0
    elif start >= length:
        start = length - 1 if step < 0 else length

    if stop < 0:
        stop += length
        if stop < 0:
            stop = -1 if step < 0 else 0
    elif stop > length:
        stop = length - 1 if step < 0 else length

    if step < 0:
        if stop < start:
            slice_length = (start - stop - 1) // (-step) + 1
    else:
        if start < stop:
            slice_length = (stop - start - 1) // step + 1

    # Cases of step = 1 and step != 1 are treated separately
    if slice_length <= 0:
        return []
    elif step == 1:
        # See list_slice() in listobject.c
        result = []
        for i in range(stop - start):
            result.append(x[i+start])
        return result
    else:
        result = []
        cur = start
        for i in range(slice_length):
            result.append(x[cur])
            cur += step
        return result
#!/usr/bin/env python

def slicegraphical(s, lista):

    if len(s) > 9:
        print """Enter a string of maximum 9 characters,
    so the printig would looki nice"""
        return 0;
    # print " ",
    print '  '+'+---' * len(s) +'+'
    print ' ',
    for letter in s:
        print '| {}'.format(letter),
    print '|'
    print " ",; print '+---' * len(s) +'+'

    print " ",
    for letter in range(len(s) +1):
        print '{}  '.format(letter),
    print ""
    for letter in range(-1*(len(s)), 0):
        print ' {}'.format(letter),
    print ''
    print ''


    for triada in lista:
        if len(triada) == 3:
            if triada[0]==None and triada[1] == None and triada[2] == None:
                # 000
                print s+'[   :   :   ]' +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] == None and triada[2] != None:
                # 001
                print s+'[   :   :{0:2d} ]'.format(triada[2], '','') +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] == None:
                # 010
                print s+'[   :{0:2d} :   ]'.format(triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] == None and triada[1] != None and triada[2] != None:
                # 011
                print s+'[   :{0:2d} :{1:2d} ]'.format(triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] == None:
                # 100
                print s+'[{0:2d} :   :   ]'.format(triada[0]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] == None and triada[2] != None:
                # 101
                print s+'[{0:2d} :   :{1:2d} ]'.format(triada[0], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] == None:
                # 110
                print s+'[{0:2d} :{1:2d} :   ]'.format(triada[0], triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
            elif triada[0] != None and triada[1] != None and triada[2] != None:
                # 111
                print s+'[{0:2d} :{1:2d} :{2:2d} ]'.format(triada[0], triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]

        elif len(triada) == 2:
            if triada[0] == None and triada[1] == None:
                # 00
                print s+'[   :   ]    ' + ' = ', s[triada[0]:triada[1]]
            elif triada[0] == None and triada[1] != None:
                # 01
                print s+'[   :{0:2d} ]    '.format(triada[1]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] == None:
                # 10
                print s+'[{0:2d} :   ]    '.format(triada[0]) + ' = ', s[triada[0]:triada[1]]
            elif triada[0] != None and triada[1] != None:
                # 11
                print s+'[{0:2d} :{1:2d} ]    '.format(triada[0],triada[1]) + ' = ', s[triada[0]:triada[1]]

        elif len(triada) == 1:
            print s+'[{0:2d} ]        '.format(triada[0]) + ' = ', s[triada[0]]


if __name__ == '__main__':
    # Change "s" to what ever string you like, make it 9 characters for
    # better representation.
    s = 'COMPUTERS'

    # add to this list different lists to experement with indexes
    # to represent ex. s[::], use s[None, None,None], otherwise you get an error
    # for s[2:] use s[2:None]

    lista = [[4,7],[2,5,2],[-5,1,-1],[4],[-4,-6,-1], [2,-3,1],[2,-3,-1], [None,None,-1],[-5,None],[-5,0,-1],[-5,None,-1],[-1,1,-2]]

    slicegraphical(s, lista)

你可以运行这个脚本并进行实验,下面是我从脚本中获得的一些示例。

  +---+---+---+---+---+---+---+---+---+
  | C | O | M | P | U | T | E | R | S |
  +---+---+---+---+---+---+---+---+---+
  0   1   2   3   4   5   6   7   8   9   
 -9  -8  -7  -6  -5  -4  -3  -2  -1 

COMPUTERS[ 4 : 7 ]     =  UTE
COMPUTERS[ 2 : 5 : 2 ] =  MU
COMPUTERS[-5 : 1 :-1 ] =  UPM
COMPUTERS[ 4 ]         =  U
COMPUTERS[-4 :-6 :-1 ] =  TU
COMPUTERS[ 2 :-3 : 1 ] =  MPUT
COMPUTERS[ 2 :-3 :-1 ] =  
COMPUTERS[   :   :-1 ] =  SRETUPMOC
COMPUTERS[-5 :   ]     =  UTERS
COMPUTERS[-5 : 0 :-1 ] =  UPMO
COMPUTERS[-5 :   :-1 ] =  UPMOC
COMPUTERS[-1 : 1 :-2 ] =  SEUM
[Finished in 0.9s]

当使用否定步骤时,请注意答案向右移动1。

我不认为Python教程图(在各种其他答案中引用)是好的,因为这个建议适用于积极的步幅,但不适用于消极的步幅。

这是一个图表:

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1

从图中,我希望[-4,-6,-1]是yP,但它是ty。

>>> a = "Python"
>>> a[2:4:1] # as expected
'th'
>>> a[-4:-6:-1] # off by 1
'ty'

始终有效的方法是在字符或槽中思考,并将索引用作半开区间——如果是正步幅,则右开,如果是负步幅,那么左开。

这样,我可以将[-4:-6:-1]看作是区间术语中的(-6,-4])。

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5  
  -6  -5  -4  -3  -2  -1

 +---+---+---+---+---+---+---+---+---+---+---+---+
 | P | y | t | h | o | n | P | y | t | h | o | n |
 +---+---+---+---+---+---+---+---+---+---+---+---+
  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5  

我的大脑似乎很乐意接受lst[开始:结束]包含开始项。我甚至可以说这是一个“自然的假设”。

但偶尔会有一种怀疑悄悄出现,我的大脑会要求我保证它不包含结尾元素。

在这些时刻,我依靠这个简单的定理:

for any n,    lst = lst[:n] + lst[n:]

这个漂亮的属性告诉我,lst[start:end]不包含end-th项,因为它位于lst[end:]中。

注意,这个定理对任何n都是正确的。例如,您可以检查

lst = range(10)
lst[:-42] + lst[-42:] == lst

返回True。

语法为:

a[start:stop]  # items start through stop-1
a[start:]      # items start through the rest of the array
a[:stop]       # items from the beginning through stop-1
a[:]           # a copy of the whole array

还有一个步长值,可用于上述任何一项:

a[start:stop:step] # start through not past stop, by step

要记住的关键点是:stop值表示不在所选切片中的第一个值。因此,停止和开始之间的区别是所选元素的数量(如果步骤为1,则为默认值)。

另一个特点是start或stop可以是负数,这意味着它从数组的末尾开始计数,而不是从开始计数。因此:

a[-1]    # last item in the array
a[-2:]   # last two items in the array
a[:-2]   # everything except the last two items

类似地,步骤可以是负数:

a[::-1]    # all items in the array, reversed
a[1::-1]   # the first two items, reversed
a[:-3:-1]  # the last two items, reversed
a[-3::-1]  # everything except the last two items, reversed

如果项目比你要求的少,Python对程序员很友好。例如,如果您请求一个[:-2],而一个只包含一个元素,则会得到一个空列表而不是一个错误。有时你会更喜欢错误,所以你必须意识到这可能会发生。

与切片对象的关系

切片对象可以表示切片操作,即:

a[start:stop:step]

相当于:

a[slice(start, stop, step)]

根据参数的数量,切片对象的行为也略有不同,类似于range(),即切片(stop)和切片(start,stop[,step])都受支持。要跳过指定给定参数,可以使用None,例如[start:]等同于[sslice(start,None)]或[::-1]等同于[Sslice(None,None,-1)]。

虽然基于:的表示法对简单切片非常有用,但slice()对象的显式使用简化了切片的编程生成。