我有一个字符串列表,我想执行一个自然的字母排序。
例如,下面的列表是自然排序(我想要的):
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
下面是上面列表的“排序”版本(我使用sorted()得到的):
['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
我在寻找一个排序函数它的行为和第一个一样。
我有一个字符串列表,我想执行一个自然的字母排序。
例如,下面的列表是自然排序(我想要的):
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
下面是上面列表的“排序”版本(我使用sorted()得到的):
['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
我在寻找一个排序函数它的行为和第一个一样。
当前回答
我使用的算法是padzero_with_lower,定义如下:
import re
def padzero_with_lower(s):
return re.sub(r'\d+', lambda m: m.group(0).rjust(10, '0'), s).lower()
该算法发现:
查找并填充任意长度的数字,直到足够大的长度,例如10 然后,它将字符串转换为小写
下面是一个用法示例:
print(padzero_with_lower('file1.txt')) # file0000000001.txt
print(padzero_with_lower('file12.txt')) # file0000000012.txt
print(padzero_with_lower('file23.txt')) # file0000000023.txt
print(padzero_with_lower('file123.txt')) # file0000000123.txt
print(padzero_with_lower('file301.txt')) # file0000000301.txt
print(padzero_with_lower('Dir2/file15.txt')) # dir0000000002/file0000000015.txt
print(padzero_with_lower('dir2/file123.txt')) # dir0000000002/file0000000123.txt
print(padzero_with_lower('dir15/file2.txt')) # dir0000000015/file0000000002.txt
print(padzero_with_lower('Dir15/file15.txt')) # dir0000000015/file0000000015.txt
print(padzero_with_lower('elm0')) # elm0000000000
print(padzero_with_lower('elm1')) # elm0000000001
print(padzero_with_lower('Elm2')) # elm0000000002
print(padzero_with_lower('elm9')) # elm0000000009
print(padzero_with_lower('elm10')) # elm0000000010
print(padzero_with_lower('Elm11')) # elm0000000011
print(padzero_with_lower('Elm12')) # elm0000000012
print(padzero_with_lower('elm13')) # elm0000000013
测试了这个函数后,我们现在可以使用它作为我们的键。
lis = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
lis.sort(key=padzero_with_lower)
print(lis)
# Output: ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
其他回答
试试这个:
import re
def natural_sort(l):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(l, key=alphanum_key)
输出:
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
代码改编自这里:排序人类:自然排序顺序。
>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
为了记录,下面是Mark Byers的简单解决方案的另一个变体,类似于Walter Tross建议的解决方案,避免调用isdigit()。这不仅使它更快,而且还避免了可能发生的问题,因为与regex \d+相比,isdigit()将更多的unicode字符视为数字。
import re
from itertools import cycle
_re_digits = re.compile(r"(\d+)")
def natural_comparison_key(key):
return tuple(
int(part) if is_digit else part
for part, is_digit in zip(_re_digits.split(key), cycle((False, True)))
)
基于这里的答案,我写了一个natural_sorted函数,它的行为类似于内置函数的排序:
# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
import re
def natural_sorted(iterable, key=None, reverse=False):
"""Return a new naturally sorted list from the items in *iterable*.
The returned list is in natural sort order. The string is ordered
lexicographically (using the Unicode code point number to order individual
characters), except that multi-digit numbers are ordered as a single
character.
Has two optional arguments which must be specified as keyword arguments.
*key* specifies a function of one argument that is used to extract a
comparison key from each list element: ``key=str.lower``. The default value
is ``None`` (compare the elements directly).
*reverse* is a boolean value. If set to ``True``, then the list elements are
sorted as if each comparison were reversed.
The :func:`natural_sorted` function is guaranteed to be stable. A sort is
stable if it guarantees not to change the relative order of elements that
compare equal --- this is helpful for sorting in multiple passes (for
example, sort by department, then by salary grade).
"""
prog = re.compile(r"(\d+)")
def alphanum_key(element):
"""Split given key in list of strings and digits"""
return [int(c) if c.isdigit() else c for c in prog.split(key(element)
if key else element)]
return sorted(iterable, key=alphanum_key, reverse=reverse)
源代码也可以在我的GitHub片段存储库: https://github.com/bdrung/snippets/blob/master/natural_sorted.py
下面是马克·拜尔斯的另一个版本的回答。这个版本演示了如何传入一个属性名,该属性名将用于计算列表中的对象。
def natural_sort(l, attrib):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key.__dict__[attrib])]
return sorted(l, key=alphanum_key)
results = natural_sort(albums, 'albumid')
其中albums是一个Album实例列表,albumid是一个字符串属性,名义上包含数字。