我有一个字符串列表,我想执行一个自然的字母排序。

例如,下面的列表是自然排序(我想要的):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

下面是上面列表的“排序”版本(我使用sorted()得到的):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

我在寻找一个排序函数它的行为和第一个一样。


当前回答

基于这里的答案,我写了一个natural_sorted函数,它的行为类似于内置函数的排序:

# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

源代码也可以在我的GitHub片段存储库: https://github.com/bdrung/snippets/blob/master/natural_sorted.py

其他回答

下面是马克·拜尔回答的一个更加python化的版本:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

现在这个函数可以在任何使用它的函数中用作键,比如list。Sort, sorted, max,等等。

作为lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

完全可重复的演示代码:

import re
natsort = lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]
L = ["a1", "a10", "a11", "a2", "a22", "a3"]   
print(sorted(L, key=natsort))  
# ['a1', 'a2', 'a3', 'a10', 'a11', 'a22'] 

我写了一个基于http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html的函数,它增加了传递自己的“键”参数的能力。我需要这样才能执行包含更复杂对象(不仅仅是字符串)的列表的自然排序。

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

例如:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]
a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

应答:

气泡排序作业

如何在python中一次读一个字母的字符串

我使用的算法是padzero_with_lower,定义如下:

import re

def padzero_with_lower(s):
    return re.sub(r'\d+', lambda m: m.group(0).rjust(10, '0'), s).lower()

该算法发现:

查找并填充任意长度的数字,直到足够大的长度,例如10 然后,它将字符串转换为小写

下面是一个用法示例:

print(padzero_with_lower('file1.txt'))   # file0000000001.txt
print(padzero_with_lower('file12.txt'))  # file0000000012.txt
print(padzero_with_lower('file23.txt'))  # file0000000023.txt
print(padzero_with_lower('file123.txt')) # file0000000123.txt
print(padzero_with_lower('file301.txt')) # file0000000301.txt
print(padzero_with_lower('Dir2/file15.txt'))  # dir0000000002/file0000000015.txt
print(padzero_with_lower('dir2/file123.txt')) # dir0000000002/file0000000123.txt
print(padzero_with_lower('dir15/file2.txt'))  # dir0000000015/file0000000002.txt
print(padzero_with_lower('Dir15/file15.txt')) # dir0000000015/file0000000015.txt
print(padzero_with_lower('elm0'))  # elm0000000000
print(padzero_with_lower('elm1'))  # elm0000000001
print(padzero_with_lower('Elm2'))  # elm0000000002
print(padzero_with_lower('elm9'))  # elm0000000009
print(padzero_with_lower('elm10')) # elm0000000010
print(padzero_with_lower('Elm11')) # elm0000000011 
print(padzero_with_lower('Elm12')) # elm0000000012
print(padzero_with_lower('elm13')) # elm0000000013

测试了这个函数后,我们现在可以使用它作为我们的键。

lis = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
lis.sort(key=padzero_with_lower)
print(lis)
# Output: ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

现在是一些更优雅的东西(pythonic)

-只要碰一下

有很多实现,虽然有些已经接近,但没有一个能完全捕获现代python所提供的优雅。

使用python测试(3.5.1) 包含了一个额外的列表,以演示当 数字在字符串中间 没有测试,但是,我假设如果您的列表是相当大的,那么事先编译正则表达式会更有效 如果这是一个错误的假设,我相信有人会纠正我

罢工

from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

完整代码

#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

使用时注意

从操作系统。路径导入分割 您需要区分导入

灵感来自

Python文档-如何排序 人类排序:自然排序顺序 人的分类 这篇文章的贡献者/评论员和引用帖子