值得使用Python的re.compile吗?

在Python中对正则表达式使用compile有什么好处吗?

h = re.compile('hello')
h.match('hello world')

re.match('hello', 'hello world')

当前回答

使用第二个版本时，正则表达式在使用之前会进行编译。如果你要多次执行它，最好先编译它。如果不是每次编译都匹配一次性的是好的。

2009-01-16 21:36:28

其他回答

抛开性能差异不考虑，使用re.compile和使用编译后的正则表达式对象进行匹配(任何与正则表达式相关的操作)使得Python运行时的语义更加清晰。

我有过调试一些简单代码的痛苦经历:

compare = lambda s, p: re.match(p, s)

然后我用compare in

[x for x in data if compare(patternPhrases, x[columnIndex])]

其中patternPhrases应该是一个包含正则表达式字符串的变量，x[columnIndex]是一个包含字符串的变量。

我有麻烦，patternPhrases不匹配一些预期的字符串!

但是如果我使用re.compile形式:

compare = lambda s, p: p.match(s)

然后在

[x for x in data if compare(patternPhrases, x[columnIndex])]

Python会抱怨“字符串没有匹配属性”，因为在compare中通过位置参数映射，x[columnIndex]被用作正则表达式!其实我的意思是

compare = lambda p, s: p.match(s)

在我的例子中，使用re.compile更明确地表达了正则表达式的目的，当它的值对肉眼隐藏时，因此我可以从Python运行时检查中获得更多帮助。

因此，我这一课的寓意是，当正则表达式不仅仅是字面字符串时，那么我应该使用re.compile让Python帮助我断言我的假设。

2013-07-11 16:00:03

下面是一个简单的测试用例:

~$ for x in 1 10 100 1000 10000 100000 1000000; do python -m timeit -n $x -s 'import re' 're.match("[0-9]{3}-[0-9]{3}-[0-9]{4}", "123-123-1234")'; done
1 loops, best of 3: 3.1 usec per loop
10 loops, best of 3: 2.41 usec per loop
100 loops, best of 3: 2.24 usec per loop
1000 loops, best of 3: 2.21 usec per loop
10000 loops, best of 3: 2.23 usec per loop
100000 loops, best of 3: 2.24 usec per loop
1000000 loops, best of 3: 2.31 usec per loop

re.compile:

~$ for x in 1 10 100 1000 10000 100000 1000000; do python -m timeit -n $x -s 'import re' 'r = re.compile("[0-9]{3}-[0-9]{3}-[0-9]{4}")' 'r.match("123-123-1234")'; done
1 loops, best of 3: 1.91 usec per loop
10 loops, best of 3: 0.691 usec per loop
100 loops, best of 3: 0.701 usec per loop
1000 loops, best of 3: 0.684 usec per loop
10000 loops, best of 3: 0.682 usec per loop
100000 loops, best of 3: 0.694 usec per loop
1000000 loops, best of 3: 0.702 usec per loop

因此，这种简单的情况下编译似乎更快，即使只匹配一次。

2012-11-30 07:24:30

使用re.compile()还有一个额外的好处，即使用re.VERBOSE向正则表达式模式添加注释

pattern = '''
hello[ ]world    # Some info on my pattern logic. [ ] to recognize space
'''

re.search(pattern, 'hello world', re.VERBOSE)

虽然这不会影响代码的运行速度，但我喜欢这样做，因为这是我注释习惯的一部分。当我想要修改代码时，我完全不喜欢花时间去记住代码背后的逻辑。

2015-03-20 03:39:09

有趣的是，编译对我来说确实更有效(Win XP上的Python 2.5.2):

import re
import time

rgx = re.compile('(\w+)\s+[0-9_]?\s+\w*')
str = "average    2 never"
a = 0

t = time.time()

for i in xrange(1000000):
    if re.match('(\w+)\s+[0-9_]?\s+\w*', str):
    #~ if rgx.match(str):
        a += 1

print time.time() - t

按原样运行上述代码一次，并以相反的方式运行两个if行，编译后的正则表达式的速度将提高一倍

2009-01-20 18:06:57

我有很多运行编译过的regex 1000的经验与实时编译相比，并没有注意到任何可感知的差异

对已接受答案的投票导致假设@Triptych所说的对所有情况都是正确的。这并不一定是真的。一个很大的区别是当你必须决定是接受一个正则表达式字符串还是一个编译过的正则表达式对象作为函数的参数时:

>>> timeit.timeit(setup="""
... import re
... f=lambda x, y: x.match(y)       # accepts compiled regex as parameter
... h=re.compile('hello')
... """, stmt="f(h, 'hello world')")
0.32881879806518555
>>> timeit.timeit(setup="""
... import re
... f=lambda x, y: re.compile(x).match(y)   # compiles when called
... """, stmt="f('hello', 'hello world')")
0.809190034866333

编译正则表达式总是更好的，以防需要重用它们。

请注意，上面timeit中的示例模拟在导入时一次创建已编译的regex对象，而不是在需要匹配时“动态”创建。

2017-01-04 13:20:09

值得使用Python的re.compile吗?

推荐文章

最新文章

标签