我找到了这个关于正则表达式的教程,虽然我直观地理解了“greedy”,“勉强”和“possession”限定词的作用,但我的理解中似乎有一个严重的漏洞。

具体来说,在下面的例子中:

Enter your regex: .*foo // Greedy qualifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo // Reluctant qualifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // Possessive qualifier
Enter input string to search: xfooxxxxxxfoo
No match found.

解释提到了吃掉整个输入字符串,字母被消耗,匹配器后退,最右边出现的“foo”被反流,等等。

不幸的是,尽管有这些美好的比喻,我还是不明白谁吃什么……您是否知道其他教程(简要地)解释正则表达式引擎如何工作?

或者,如果有人能用不同的措辞解释下一段,那将是非常感激的:

The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f", "o", "o". Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters ("f", "o", "o") have already been consumed [by whom?]. So the matcher slowly backs off [from right-to-left?] one letter at a time until the rightmost occurrence of "foo" has been regurgitated [what does this mean?], at which point the match succeeds and the search ends. The second example, however, is reluctant, so it starts by first consuming [by whom?] "nothing". Because "foo" doesn't appear at the beginning of the string, it's forced to swallow [who swallows?] the first letter (an "x"), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13. The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+ [how?], leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off [what does back off mean?]; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.


当前回答

贪婪:“匹配最长的字符序列”

勉强:“匹配最短的字符序列”

所有格:这有点奇怪,因为它并不(与贪婪和不情愿相反)试图为整个正则表达式找到一个匹配。

顺便说一句:任何正则模式匹配器实现都不会使用回溯。所有现实生活中的模式匹配器都非常快——几乎与正则表达式的复杂性无关!

其他回答

贪婪:“匹配最长的字符序列”

勉强:“匹配最短的字符序列”

所有格:这有点奇怪,因为它并不(与贪婪和不情愿相反)试图为整个正则表达式找到一个匹配。

顺便说一句:任何正则模式匹配器实现都不会使用回溯。所有现实生活中的模式匹配器都非常快——几乎与正则表达式的复杂性无关!

我会试试看的。

A greedy quantifier first matches as much as possible. So the .* matches the entire string. Then the matcher tries to match the f following, but there are no characters left. So it "backtracks", making the greedy quantifier match one less character (leaving the "o" at the end of the string unmatched). That still doesn't match the f in the regex, so it backtracks one more step, making the greedy quantifier match one less character again (leaving the "oo" at the end of the string unmatched). That still doesn't match the f in the regex, so it backtracks one more step (leaving the "foo" at the end of the string unmatched). Now, the matcher finally matches the f in the regex, and the o and the next o are matched too. Success!

A reluctant or "non-greedy" quantifier first matches as little as possible. So the .* matches nothing at first, leaving the entire string unmatched. Then the matcher tries to match the f following, but the unmatched portion of the string starts with "x" so that doesn't work. So the matcher backtracks, making the non-greedy quantifier match one more character (now it matches the "x", leaving "fooxxxxxxfoo" unmatched). Then it tries to match the f, which succeeds, and the o and the next o in the regex match too. Success!

在您的示例中,它将使用字符串的剩余未匹配部分“xxxxxxfoo”重新开始该过程,遵循相同的过程。

所有格量词就像贪婪量词一样,但它不会倒退。所以它以。*开始匹配整个字符串,没有不匹配的。那么它就没有什么可以与正则表达式中的f匹配了。由于所有格量词没有回溯,匹配就失败了。

下面是我使用单元格和索引位置(参见这里的图表来区分单元格和索引)。

贪婪-匹配尽可能多的贪婪量词和整个正则表达式。如果没有匹配,则回溯贪婪量词。

输入字符串:xfooxxxxxxfoo Regex: * foo

上面的Regex有两部分: (我)”。*’和 (2)“foo” 下面的每个步骤都将分析这两个部分。匹配“Pass”或“Fail”的附加注释在大括号内解释。

步骤1: (i) .* = xfooxxxxxxfoo - PASS('。*'是一个贪婪的量词,将使用整个输入字符串) (ii) foo =索引13后没有字符要匹配- FAIL 比赛失败了。

步骤2: (i) .* = xfooxxxxxxfo - PASS(贪心量词'.*'的回溯) (ii) foo = o - FAIL 比赛失败了。

步骤3: (i) .* = xfooxxxxxxf - PASS(贪心量词'.*'的回溯) (ii) foo = oo - FAIL 比赛失败了。

步骤4: (i) .* = xfooxxxxxx - PASS(贪心量词'.*'的回溯) (ii) foo = foo - PASS 报告匹配

结果:1个(es) 我找到了文本“xfooxxxxxxfoo”,从索引0开始,到索引13结束。

勉强-尽可能少地匹配勉强量词,并匹配整个正则表达式。如果没有匹配,则为不情愿量词添加字符。

输入字符串:xfooxxxxxxfoo Regex: . * ?喷火

上面的正则表达式有两部分: (我)”。* ?”, (2)“foo”

步骤1: . * ?= "(空白)- PASS(尽可能少匹配不情愿的量词'.*?'。索引0包含"是一个匹配。) foo = xfo - FAIL(单元格0,1,2 -即索引在0和3之间) 比赛失败了。

步骤2: . * ?= x - PASS(为不情愿量词'.*?'添加字符。单元格0有'x'是匹配的。) foo = foo - PASS 报告匹配

步骤3: . * ?= "(空白)- PASS(尽可能少匹配不情愿的量词'.*?'。索引4包含"是一个匹配。) foo = xxx - FAIL(单元格4,5,6 -即索引在4和7之间) 比赛失败了。

步骤4: . * ?= x - PASS(为不情愿量词'.*?'添加字符。细胞4。) foo = xxx - FAIL(单元格5,6,7 -即索引在5和8之间) 比赛失败了。

步骤5: . * ?= xx - PASS(为不情愿量词'.*?'添加字符。4号到5号单元) foo = xxx - FAIL(单元格6,7,8 -即索引在6和9之间) 比赛失败了。

步骤6: . * ?= xxx - PASS(为不情愿量词'.*?'添加字符。4号到6号牢房) foo = xxx - FAIL(单元格7,8,9 -即索引在7和10之间) 比赛失败了。

第七步: . * ?= xxxx - PASS(为不情愿量词'.*?'添加字符。4号到7号牢房) foo = xxf - FAIL(单元格8,9,10 -即索引在8和11之间) 比赛失败了。

第八步: . * ?= xxxxx - PASS(为不情愿量词'.*?'添加字符。4号到8号牢房) foo = xfo - FAIL(单元格9,10,11 -即索引在9和12之间) 比赛失败了。

步骤9: . * ?= xxxxxx - PASS(为不情愿量词'.*?'添加字符。4号至9号牢房) foo = foo - PASS(单元格10,11,12 -即索引在10和13之间) 报告匹配

第十步: . * ?= "(空白)- PASS(尽可能少匹配不情愿的量词'.*?'。索引13为空。) foo =没有字符需要匹配- FAIL(索引13之后没有任何字符需要匹配) 比赛失败了。

结果:2个(es) 我发现文本“xfoo”从索引0开始,在索引4结束。 我找到了文本“xxxxxxfoo”,从索引4开始,到索引13结束。

所有格-尽可能匹配所有格量词并匹配整个正则表达式。不要反悔。

输入字符串:xfooxxxxxxfoo Regex: . * + foo

上面的正则表达式有两部分:'。*+'和'foo'。

步骤1: .*+ = xfooxxxxxxfoo - PASS(尽可能匹配所有格量词'.*') foo =没有剩余字符匹配- FAIL(索引13后没有匹配) 比赛失败了。

注意:不允许回溯。

结果:0次匹配(es)

这只是我的实践输出可视化的场景-

贪婪量化涉及在迭代期间使用字符串中所有剩余的未验证字符进行模式匹配。未验证的字符从活动序列开始。每次不匹配时,就隔离最后的字符,并再次执行检查。

当活动序列仅满足正则表达式模式的前导条件时,将尝试针对隔离验证其余条件。如果验证成功,则验证隔离区中匹配的字符,剩余的未匹配字符仍未验证,并将在下一次迭代中重新开始该流程时使用。

字符流是从活动序列进入隔离区的。结果的行为是尽可能多地将原始序列包含在匹配中。

不情愿量化与贪婪限定在很大程度上是相同的,除了字符流是相反的——也就是说,它们从隔离开始流入活动序列。结果是匹配中尽可能少地包含原始序列。

所有格量化没有隔离区,包括固定活动序列中的所有内容。