我厌倦了总是试图猜测,如果我应该转义特殊字符,如'()[]{}|'等使用regexp的许多实现时。
它与Python、sed、grep、awk、Perl、rename、Apache、find等不同。 有没有什么规则集告诉我什么时候应该转义,什么时候不应该转义特殊字符?它是否依赖于regexp类型,如PCRE、POSIX或扩展的regexp ?
我厌倦了总是试图猜测,如果我应该转义特殊字符,如'()[]{}|'等使用regexp的许多实现时。
它与Python、sed、grep、awk、Perl、rename、Apache、find等不同。 有没有什么规则集告诉我什么时候应该转义,什么时候不应该转义特殊字符?它是否依赖于regexp类型,如PCRE、POSIX或扩展的regexp ?
当前回答
为了避免担心哪个regex变量和所有定制的特性,只需使用这个通用函数,它涵盖了除了BRE之外的每个regex变量(除非它们有unicode多字节字符是元字符):
jot -s '' -c - 32 126 |
mawk ' 功能 重返substr(_ =””, gsub ("[][!-/_\ 140 :-@{-~]","[&]",__), gsub ("["(_="\\\\")"^]",_ "&",__))__ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~
[!]["][#][$][%][&]['][(][)][*][+][,][-][.][/]
0 1 2 3 4 5 6 7 8 9 [:][;][<][=][>][?]
[@] ABCDEFGHIJKLMNOPQRSTUVWXYZ [[]\\ []]\^ [_]
[`] abcdefghijklmnopqrstuvwxyz [{][|][}][~]
方括号更容易处理,因为没有触发关于“转义太多”的警告信息的风险,例如:
function ____(_) {
return substr("", gsub("[[:punct:]]","\\\\&",_))_
}
\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/ 0123456789\:\;\<\=\>\?
\@ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^\_\`abcdefghijklmnopqrstuvwxyz \{\|\}\~
gawk: cmd. line:1: warning: regexp escape sequence `\!' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\"' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\%' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\&' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\,' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\:' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\;' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\=' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\@' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\_' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\~' is not a known regexp operator
其他回答
POSIX识别正则表达式的多种变体——基本正则表达式(BRE)和扩展正则表达式(ERE)。即使这样,由于POSIX标准化的实用程序的历史实现,也存在一些怪癖。
对于何时使用哪种符号,甚至给定命令使用哪种符号,并没有一个简单的规则。
看看Jeff Friedl的《精通正则表达式》这本书。
对于Ionic (Typescript),你必须用双斜杠来转义字符。 例如(这是为了匹配一些特殊字符):
"^(?=.*[\\]\\[!¡\'=ªº\\-\\_ç@#$%^&*(),;\\.?\":{}|<>\+\\/])"
注意这个]- _。/字符。它们必须被一分为二。如果不这样做,代码中就会出现类型错误。
为了避免担心哪个regex变量和所有定制的特性,只需使用这个通用函数,它涵盖了除了BRE之外的每个regex变量(除非它们有unicode多字节字符是元字符):
jot -s '' -c - 32 126 |
mawk ' 功能 重返substr(_ =””, gsub ("[][!-/_\ 140 :-@{-~]","[&]",__), gsub ("["(_="\\\\")"^]",_ "&",__))__ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~
[!]["][#][$][%][&]['][(][)][*][+][,][-][.][/]
0 1 2 3 4 5 6 7 8 9 [:][;][<][=][>][?]
[@] ABCDEFGHIJKLMNOPQRSTUVWXYZ [[]\\ []]\^ [_]
[`] abcdefghijklmnopqrstuvwxyz [{][|][}][~]
方括号更容易处理,因为没有触发关于“转义太多”的警告信息的风险,例如:
function ____(_) {
return substr("", gsub("[[:punct:]]","\\\\&",_))_
}
\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/ 0123456789\:\;\<\=\>\?
\@ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^\_\`abcdefghijklmnopqrstuvwxyz \{\|\}\~
gawk: cmd. line:1: warning: regexp escape sequence `\!' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\"' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\%' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\&' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\,' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\:' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\;' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\=' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\@' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\_' is not a known regexp operator
gawk: cmd. line:1: warning: regexp escape sequence `\~' is not a known regexp operator
https://perldoc.perl.org/perlre.html#Quoting-metacharacters和https://perldoc.perl.org/functions/quotemeta.html
在官方文档中,这样的字符称为元字符。引用的例子:
my $regex = quotemeta($string)
s/$regex/something/
不幸的是,(和\(之类的东西的含义在Emacs样式的正则表达式和大多数其他样式之间交换。因此,如果你试图逃避这些,你可能会做与你想要的相反的事情。
所以你必须知道你想引用的是什么风格。