值得使用Python的re.compile吗?

在Python中对正则表达式使用compile有什么好处吗?

h = re.compile('hello')
h.match('hello world')

re.match('hello', 'hello world')

当前回答

FWIW:

$ python -m timeit -s "import re" "re.match('hello', 'hello world')"
100000 loops, best of 3: 3.82 usec per loop

$ python -m timeit -s "import re; h=re.compile('hello')" "h.match('hello world')"
1000000 loops, best of 3: 1.26 usec per loop

因此，如果您将经常使用同一个正则表达式，可能值得执行re.compile(特别是对于更复杂的正则表达式)。

反对过早优化的标准论点适用，但如果您怀疑regexp可能成为性能瓶颈，我不认为使用re.compile会真正失去多少清晰度/直接性。

更新:

在Python 3.6(我怀疑上述计时是使用Python 2.x完成的)和2018硬件(MacBook Pro)下，我现在得到以下计时:

% python -m timeit -s "import re" "re.match('hello', 'hello world')"
1000000 loops, best of 3: 0.661 usec per loop

% python -m timeit -s "import re; h=re.compile('hello')" "h.match('hello world')"
1000000 loops, best of 3: 0.285 usec per loop

% python -m timeit -s "import re" "h=re.compile('hello'); h.match('hello world')"
1000000 loops, best of 3: 0.65 usec per loop

% python --version
Python 3.6.5 :: Anaconda, Inc.

我还添加了一个案例(注意最后两次运行之间的引号差异)，表明re.match(x，…)从字面上[大致]等价于re.compile(x).match(…)，即似乎没有发生编译表示的幕后缓存。

2009-01-16 21:42:37

其他回答

用下面的例子:

h = re.compile('hello')
h.match('hello world')

上面例子中的匹配方法和下面的不一样:

re.match('hello', 'hello world')

Re.compile()返回一个正则表达式对象，这意味着h是一个正则表达式对象。

regex对象有自己的匹配方法，带有可选的pos和endpos参数:

的。匹配(字符串[线程][线程]])

pos

可选的第二个参数pos给出了字符串中的一个索引搜寻就要开始了;缺省值为0。这并不完全是相当于对字符串进行切片;'^'模式字符匹配于字符串的真正开始和在a之后的位置换行符，但不一定在搜索到的索引处开始。

尾部

可选参数endpos限制了字符串的长度搜索;这就好像字符串有endpos个字符那么长只搜索从pos到endpos - 1的字符匹配。如果endpos小于pos，则找不到匹配;否则, 如果rx是编译后的正则表达式对象，则rx。搜索(字符串,0, 50)等于rx。搜索(字符串(:50),0)。

regex对象的search、findall和finditer方法也支持这些参数。

Re.match (pattern, string, flags=0)不支持，如你所见，它的search、findall和finditer也没有。

match对象具有补充这些参数的属性:

match.pos

的search()或match()方法传递的pos的值一个正则表达式对象。这是正则表达式所在字符串的索引引擎开始寻找匹配。

match.endpos

传递给search()或match()方法的endpos值正则表达式对象的。对象超出的字符串的索引 RE引擎不会去。

一个regex对象有两个唯一的，可能有用的属性:

regex.groups

模式中捕获组的数量。

regex.groupindex

将(?P)定义的任何符号组名映射到的字典组数字。如果没有使用符号组，则字典为空在模式中。

最后，match对象有这个属性:

match.re

其match()或search()方法的正则表达式对象生成此匹配实例。

2013-03-10 23:03:59

FWIW:

$ python -m timeit -s "import re" "re.match('hello', 'hello world')"
100000 loops, best of 3: 3.82 usec per loop

$ python -m timeit -s "import re; h=re.compile('hello')" "h.match('hello world')"
1000000 loops, best of 3: 1.26 usec per loop

因此，如果您将经常使用同一个正则表达式，可能值得执行re.compile(特别是对于更复杂的正则表达式)。

反对过早优化的标准论点适用，但如果您怀疑regexp可能成为性能瓶颈，我不认为使用re.compile会真正失去多少清晰度/直接性。

更新:

在Python 3.6(我怀疑上述计时是使用Python 2.x完成的)和2018硬件(MacBook Pro)下，我现在得到以下计时:

% python -m timeit -s "import re" "re.match('hello', 'hello world')"
1000000 loops, best of 3: 0.661 usec per loop

% python -m timeit -s "import re; h=re.compile('hello')" "h.match('hello world')"
1000000 loops, best of 3: 0.285 usec per loop

% python -m timeit -s "import re" "h=re.compile('hello'); h.match('hello world')"
1000000 loops, best of 3: 0.65 usec per loop

% python --version
Python 3.6.5 :: Anaconda, Inc.

2009-01-16 21:42:37

这个答案可能姗姗来迟，但却是一个有趣的发现。如果你打算多次使用regex，使用compile真的可以节省你的时间(这在文档中也有提到)。下面你可以看到，当直接调用match方法时，使用编译后的正则表达式是最快的。将一个编译好的正则表达式传递给re.match会使它更慢，而将re.match与patter字符串传递在中间的某个地方。

>>> ipr = r'\D+((([0-2][0-5]?[0-5]?)\.){3}([0-2][0-5]?[0-5]?))\D+'
>>> average(*timeit.repeat("re.match(ipr, 'abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
1.5077415757028423
>>> ipr = re.compile(ipr)
>>> average(*timeit.repeat("re.match(ipr, 'abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
1.8324008992184038
>>> average(*timeit.repeat("ipr.match('abcd100.10.255.255 ')", globals={'ipr': ipr, 're': re}))
0.9187896518778871

2016-09-12 08:57:31

易读性/认知负荷偏好

对我来说，主要的收获是我只需要记住和阅读复杂的正则表达式API语法的一种形式——<compiled_pattern>.method(xxx)形式而不是那个和re.func(<pattern>， xxx)形式。

re.compile(<pattern>)是一个额外的样板文件，true。

但是考虑到正则表达式，额外的编译步骤不太可能是认知负荷的主要原因。事实上，对于复杂的模式，您甚至可以通过将声明与随后对其调用的任何regex方法分开来获得清晰性。

我倾向于首先在Regex101这样的网站中调优复杂的模式，甚至在单独的最小测试脚本中调优，然后将它们带入我的代码中，因此将声明与其使用分离也适合我的工作流程。

2020-02-25 18:22:58

(几个月后)很容易在re.match周围添加自己的缓存，或者其他任何事情——

""" Re.py: Re.match = re.match + cache  
    efficiency: re.py does this already (but what's _MAXCACHE ?)
    readability, inline / separate: matter of taste
"""

import re

cache = {}
_re_type = type( re.compile( "" ))

def match( pattern, str, *opt ):
    """ Re.match = re.match + cache re.compile( pattern ) 
    """
    if type(pattern) == _re_type:
        cpat = pattern
    elif pattern in cache:
        cpat = cache[pattern]
    else:
        cpat = cache[pattern] = re.compile( pattern, *opt )
    return cpat.match( str )

# def search ...

一个wibni，如果:cachehint(size=)， cacheinfo() -> size, hits, nclear…

2009-07-06 10:13:44

值得使用Python的re.compile吗?

推荐文章

最新文章

标签