值得使用Python的re.compile吗?

在Python中对正则表达式使用compile有什么好处吗?

h = re.compile('hello')
h.match('hello world')

re.match('hello', 'hello world')

当前回答

我真的很尊重上面所有的答案。在我看来是的!当然，使用re.compile而不是一次又一次地编译正则表达式是值得的。

使用re.compile可以使代码更加动态，因为您可以调用已经编译好的正则表达式，而不是一次又一次地编译。这对你有好处:

处理器的努力时间复杂度。使正则表达式通用。(可以在findall, search, match中使用) 并使您的程序看起来很酷。

例子:

  example_string = "The room number of her room is 26A7B."
  find_alpha_numeric_string = re.compile(r"\b\w+\b")

在Findall中使用

 find_alpha_numeric_string.findall(example_string)

在搜索中使用

  find_alpha_numeric_string.search(example_string)

类似地，您可以将它用于:Match和Substitute

2017-03-14 12:31:47

其他回答

这是个好问题。你经常看到人们毫无理由地使用re.compile。它降低了可读性。但是可以肯定的是，很多时候需要预编译表达式。就像你在循环中重复使用它一样。

这就像编程的一切(实际上是生活中的一切)。运用常识。

2009-01-16 21:44:59

在无意中看到这里的讨论之前，我运行了这个测试。然而，在运行它之后，我想我至少会发布我的结果。

我剽窃了Jeff Friedl的“精通正则表达式”中的例子。这是在一台运行OSX 10.6 (2Ghz英特尔酷睿2双核，4GB内存)的macbook上。Python版本为2.6.1。

运行1 -使用re.compile

import re 
import time 
import fpformat
Regex1 = re.compile('^(a|b|c|d|e|f|g)+$') 
Regex2 = re.compile('^[a-g]+$')
TimesToDo = 1000
TestString = "" 
for i in range(1000):
    TestString += "abababdedfg"
StartTime = time.time() 
for i in range(TimesToDo):
    Regex1.search(TestString) 
Seconds = time.time() - StartTime 
print "Alternation takes " + fpformat.fix(Seconds,3) + " seconds"

StartTime = time.time() 
for i in range(TimesToDo):
    Regex2.search(TestString) 
Seconds = time.time() - StartTime 
print "Character Class takes " + fpformat.fix(Seconds,3) + " seconds"

Alternation takes 2.299 seconds
Character Class takes 0.107 seconds

运行2 -不使用re.compile

import re 
import time 
import fpformat

TimesToDo = 1000
TestString = "" 
for i in range(1000):
    TestString += "abababdedfg"
StartTime = time.time() 
for i in range(TimesToDo):
    re.search('^(a|b|c|d|e|f|g)+$',TestString) 
Seconds = time.time() - StartTime 
print "Alternation takes " + fpformat.fix(Seconds,3) + " seconds"

StartTime = time.time() 
for i in range(TimesToDo):
    re.search('^[a-g]+$',TestString) 
Seconds = time.time() - StartTime 
print "Character Class takes " + fpformat.fix(Seconds,3) + " seconds"

Alternation takes 2.508 seconds
Character Class takes 0.109 seconds

2010-01-17 21:22:18

我自己刚试过。对于从字符串中解析数字并对其求和的简单情况，使用编译后的正则表达式对象的速度大约是使用re方法的两倍。

正如其他人指出的那样，re方法(包括re.compile)在以前编译的表达式缓存中查找正则表达式字符串。因此，在正常情况下，使用re方法的额外成本只是缓存查找的成本。

然而，检查代码，缓存被限制为100个表达式。这就引出了一个问题，缓存溢出有多痛苦?该代码包含正则表达式编译器的内部接口re.sre_compile.compile。如果我们调用它，就绕过了缓存。结果表明，对于一个基本的正则表达式，例如r'\w+\s+([0-9_]+)\s+\w*'，它要慢两个数量级。

下面是我的测试:

#!/usr/bin/env python
import re
import time

def timed(func):
    def wrapper(*args):
        t = time.time()
        result = func(*args)
        t = time.time() - t
        print '%s took %.3f seconds.' % (func.func_name, t)
        return result
    return wrapper

regularExpression = r'\w+\s+([0-9_]+)\s+\w*'
testString = "average    2 never"

@timed
def noncompiled():
    a = 0
    for x in xrange(1000000):
        m = re.match(regularExpression, testString)
        a += int(m.group(1))
    return a

@timed
def compiled():
    a = 0
    rgx = re.compile(regularExpression)
    for x in xrange(1000000):
        m = rgx.match(testString)
        a += int(m.group(1))
    return a

@timed
def reallyCompiled():
    a = 0
    rgx = re.sre_compile.compile(regularExpression)
    for x in xrange(1000000):
        m = rgx.match(testString)
        a += int(m.group(1))
    return a


@timed
def compiledInLoop():
    a = 0
    for x in xrange(1000000):
        rgx = re.compile(regularExpression)
        m = rgx.match(testString)
        a += int(m.group(1))
    return a

@timed
def reallyCompiledInLoop():
    a = 0
    for x in xrange(10000):
        rgx = re.sre_compile.compile(regularExpression)
        m = rgx.match(testString)
        a += int(m.group(1))
    return a

r1 = noncompiled()
r2 = compiled()
r3 = reallyCompiled()
r4 = compiledInLoop()
r5 = reallyCompiledInLoop()
print "r1 = ", r1
print "r2 = ", r2
print "r3 = ", r3
print "r4 = ", r4
print "r5 = ", r5
</pre>
And here is the output on my machine:
<pre>
$ regexTest.py 
noncompiled took 4.555 seconds.
compiled took 2.323 seconds.
reallyCompiled took 2.325 seconds.
compiledInLoop took 4.620 seconds.
reallyCompiledInLoop took 4.074 seconds.
r1 =  2000000
r2 =  2000000
r3 =  2000000
r4 =  2000000
r5 =  20000

'reallyCompiled'方法使用内部接口，绕过缓存。注意，在每个循环迭代中编译的代码只迭代了10,000次，而不是一百万次。

2010-04-14 04:40:24

我的理解是，这两个例子实际上是等价的。唯一的区别是，在第一种情况下，您可以在其他地方重用已编译的正则表达式，而不会导致再次编译它。

这里有一个参考:http://diveintopython3.ep.io/refactoring.html

使用字符串'M'调用已编译模式对象的搜索函数，其效果与同时使用正则表达式和字符串'M'调用re.search相同。只是要快得多。(事实上，re.search函数只是编译正则表达式，并为您调用结果模式对象的搜索方法。)

2009-01-16 21:38:12

抛开性能差异不考虑，使用re.compile和使用编译后的正则表达式对象进行匹配(任何与正则表达式相关的操作)使得Python运行时的语义更加清晰。

我有过调试一些简单代码的痛苦经历:

compare = lambda s, p: re.match(p, s)

然后我用compare in

[x for x in data if compare(patternPhrases, x[columnIndex])]

其中patternPhrases应该是一个包含正则表达式字符串的变量，x[columnIndex]是一个包含字符串的变量。

我有麻烦，patternPhrases不匹配一些预期的字符串!

但是如果我使用re.compile形式:

compare = lambda s, p: p.match(s)

然后在

[x for x in data if compare(patternPhrases, x[columnIndex])]

Python会抱怨“字符串没有匹配属性”，因为在compare中通过位置参数映射，x[columnIndex]被用作正则表达式!其实我的意思是

compare = lambda p, s: p.match(s)

在我的例子中，使用re.compile更明确地表达了正则表达式的目的，当它的值对肉眼隐藏时，因此我可以从Python运行时检查中获得更多帮助。

因此，我这一课的寓意是，当正则表达式不仅仅是字面字符串时，那么我应该使用re.compile让Python帮助我断言我的假设。

2013-07-11 16:00:03

值得使用Python的re.compile吗?

推荐文章

最新文章

标签