我想提取一个字符串中包含的所有数字。正则表达式和isdigit()方法哪个更适合这个目的?

例子:

line = "hello 12 hi 89"

结果:

[12, 89]

当前回答

我发现的最干净的方法是:

>>> data = 'hs122 125 &55,58, 25'
>>> new_data = ''.join((ch if ch in '0123456789.-e' else ' ') for ch in data)
>>> numbers = [i for i in new_data.split()]
>>> print(numbers)
['122', '125', '55', '58', '25']

或:

>>> import re
>>> data = 'hs122 125 &55,58, 25'
>>> numbers = re.findall(r'\d+', data)
>>> print(numbers)
['122', '125', '55', '58', '25']

其他回答

line2 = "hello 12 hi 89"  # this is the given string 
temp1 = re.findall(r'\d+', line2) # find number of digits through regular expression
res2 = list(map(int, temp1))
print(res2)

可以使用findall表达式通过digit搜索字符串中的所有整数。

在第二步中,创建一个列表res2,并将string中找到的数字添加到该列表中。

这有点晚了,但是您也可以扩展正则表达式来考虑科学符号。

import re

# Format is [(<string>, <expected output>), ...]
ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
       ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),
      ('hello X42 I\'m a Y-32.35 string Z30',
       ['42', '-32.35', '30']),
      ('he33llo 42 I\'m a 32 string -30', 
       ['33', '42', '32', '-30']),
      ('h3110 23 cat 444.4 rabbit 11 2 dog', 
       ['3110', '23', '444.4', '11', '2']),
      ('hello 12 hi 89', 
       ['12', '89']),
      ('4', 
       ['4']),
      ('I like 74,600 commas not,500', 
       ['74,600', '500']),
      ('I like bad math 1+2=.001', 
       ['1', '+2', '.001'])]

for s, r in ss:
    rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)
    if rr == r:
        print('GOOD')
    else:
        print('WRONG', rr, 'should be', r)

给予一切美好!

此外,您还可以查看AWS Glue内置正则表达式

我将使用regexp:

>>> import re
>>> re.findall(r'\d+', "hello 42 I'm a 32 string 30")
['42', '32', '30']

这也匹配bla42bla中的42。如果你只想用单词边界(空格,句号,逗号)分隔数字,你可以使用\b:

>>> re.findall(r'\b\d+\b', "he33llo 42 I'm a 32 string 30")
['42', '32', '30']

以数字列表而不是字符串列表结束:

>>> [int(s) for s in re.findall(r'\b\d+\b', "he33llo 42 I'm a 32 string 30")]
[42, 32, 30]

注意:这对负整数不起作用

我发现的最干净的方法是:

>>> data = 'hs122 125 &55,58, 25'
>>> new_data = ''.join((ch if ch in '0123456789.-e' else ' ') for ch in data)
>>> numbers = [i for i in new_data.split()]
>>> print(numbers)
['122', '125', '55', '58', '25']

或:

>>> import re
>>> data = 'hs122 125 &55,58, 25'
>>> numbers = re.findall(r'\d+', data)
>>> print(numbers)
['122', '125', '55', '58', '25']

我找到的最佳选择如下。它将提取一个数字,并可以消除任何类型的字符。

def extract_nbr(input_str):
    if input_str is None or input_str == '':
        return 0

    out_number = ''
    for ele in input_str:
        if ele.isdigit():
            out_number += ele
    return float(out_number)