我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:

匹配

CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT

不匹配

aWC2H 7LT WC2H 7LTa WC2H

我怎么解决这个问题?


当前回答

虽然这里有很多答案,但我对其中任何一个都不满意。他们中的大多数只是简单地坏了,太复杂或只是坏了。

我看了@ctwheels的答案,我发现它非常具有解释性和正确性;我们必须为此感谢他。然而,对我来说,如此简单的事情又有太多的“数据”了。

幸运的是,我设法获得了一个数据库,其中仅包含英国的100多万个活动邮政编码,并编写了一个小型PowerShell脚本来测试和基准测试结果。

英国邮政编码规格:有效的邮政编码格式。

这是“我的”正则表达式:

^([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})\s(\d[a-zA-Z]{2})$

简短,简单,甜蜜。即使是最没有经验的人也能明白发生了什么。

解释:

^ asserts position at start of a line
    1st Capturing Group ([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})
        Match a single character present in the list below [a-zA-Z]
        {1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
        a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
        A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
        Match a single character present in the list below [a-zA-Z\d]
        {1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
        a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
        A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
        \d matches a digit (equivalent to [0-9])
        \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
    2nd Capturing Group (\d[a-zA-Z]{2})
        \d matches a digit (equivalent to [0-9])
        Match a single character present in the list below [a-zA-Z]
        {2} matches the previous token exactly 2 times
        a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
        A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line

结果(已核对邮编):

TOTAL OK: 1469193
TOTAL FAILED: 0
-------------------------------------------------------------------------
Days              : 0
Hours             : 0
Minutes           : 5
Seconds           : 22
Milliseconds      : 718
Ticks             : 3227185939
TotalDays         : 0.00373516891087963
TotalHours        : 0.0896440538611111
TotalMinutes      : 5.37864323166667
TotalSeconds      : 322.7185939
TotalMilliseconds : 322718.5939

其他回答

下面是一个基于链接到marcj答案的文档中指定格式的正则表达式:

/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-Z]{2}$/

这和规格之间的唯一区别是,根据规格,最后两个字符不能在[CIKMOV]中。

编辑: 下面是另一个测试末尾字符限制的版本。

/^[A-Z]{1,2}[0-9][0-9A-Z]? ?[0-9][A-BD-HJLNP-UW-Z]{2}$/

我需要一个可以在SAS中使用PRXMATCH和相关函数的版本,所以我想到了这个:

^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$

测试用例和注意事项:

/* 
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/

/*
    Bits and pieces
    1st position (any):         [A-PR-UWYZ]         
    2nd position (if letter):   [A-HK-Y]
    3rd position (A1A format):  [A-HJKPSTUW]
    4th position (AA1A format): [ABEHMNPRV-Y]
    Last 2 positions:           [ABD-HJLNP-UW-Z]    
*/


data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1  EC1A 1BB  Special case 1
1  W1A 0AX   Special case 2
1  M1 1AE    Standard format
1  B33 8TH   Standard format
1  CR2 6XH   Standard format
1  DN55 1PT  Standard format
0  QN55 1PT  Bad letter in 1st position
0  DI55 1PT  Bad letter in 2nd position
0  W1Z 0AX   Bad letter in 3rd position
0  EC1Z 1BB  Bad letter in 4th position
0  DN55 1CT  Bad letter in 2nd group
0  A11A 1AA  Invalid digits in 1st group
0  AA11A 1AA  1st group too long
0  AA11 1AAA  2nd group too long
0  AA11 1AAA  2nd group too long
0  AAA 1AA   No digit in 1st group
0  AA 1AA    No digit in 1st group
0  A 1AA     No digit in 1st group
0  1A 1AA    Missing letter in 1st group
0  1 1AA     Missing letter in 1st group
0  11 1AA    Missing letter in 1st group
0  AA1 1A    Missing letter in 2nd group
0  AA1 1     Missing letter in 2nd group
;
run;

不存在能够验证邮政编码的综合英国邮政编码正则表达式。您可以使用正则表达式检查邮政编码的格式是否正确;并不是真的存在。

邮政编码非常复杂,而且不断变化。例如,对于每个邮政编码区域,出码W1没有,也可能永远没有1到99之间的每个数字。

你不能指望当前的东西永远都是真的。举个例子,1990年,邮局认为阿伯丁有点拥挤了。他们在AB1-5的末尾加了一个0,使它成为AB10-50,然后在这些之间创建了一些邮政编码。

每当建立一条新街道时,就会创建一个新的邮政编码。这是获得建筑许可的过程的一部分;地方当局有义务与邮局保持更新(并不是说他们都这样做)。

此外,正如许多其他用户指出的那样,还有一些特殊的邮政编码,如Girobank, GIR 0AA,以及给圣诞老人的信件,SAN TA1 -你可能不想在那里张贴任何东西,但似乎没有任何其他答案。

然后,还有BFPO的邮政编码,现在正在改为更标准的格式。两种格式都是有效的。最后,还有海外领土来源维基百科。

+----------+----------------------------------------------+
| Postcode |                   Location                   |
+----------+----------------------------------------------+
| AI-2640  | Anguilla                                     |
| ASCN 1ZZ | Ascension Island                             |
| STHL 1ZZ | Saint Helena                                 |
| TDCU 1ZZ | Tristan da Cunha                             |
| BBND 1ZZ | British Indian Ocean Territory               |
| BIQQ 1ZZ | British Antarctic Territory                  |
| FIQQ 1ZZ | Falkland Islands                             |
| GX11 1AA | Gibraltar                                    |
| PCRN 1ZZ | Pitcairn Islands                             |
| SIQQ 1ZZ | South Georgia and the South Sandwich Islands |
| TKCA 1ZZ | Turks and Caicos Islands                     |
+----------+----------------------------------------------+

接下来,你必须考虑到英国将其邮政编码系统“输出”到世界上许多地方。任何验证“英国”邮政编码的程序也将验证许多其他国家的邮政编码。

如果您想验证英国邮政编码,最安全的方法是使用当前邮政编码的查找。有很多选择:

Ordnance Survey releases Code-Point Open under an open data licence. It'll be very slightly behind the times but it's free. This will (probably - I can't remember) not include Northern Irish data as the Ordnance Survey has no remit there. Mapping in Northern Ireland is conducted by the Ordnance Survey of Northern Ireland and they have their, separate, paid-for, Pointer product. You could use this and append the few that aren't covered fairly easily. Royal Mail releases the Postcode Address File (PAF), this includes BFPO which I'm not sure Code-Point Open does. It's updated regularly but costs money (and they can be downright mean about it sometimes). PAF includes the full address rather than just postcodes and comes with its own Programmers Guide. The Open Data User Group (ODUG) is currently lobbying to have PAF released for free, here's a description of their position. Lastly, there's AddressBase. This is a collaboration between Ordnance Survey, Local Authorities, Royal Mail and a matching company to create a definitive directory of all information about all UK addresses (they've been fairly successful as well). It's paid-for but if you're working with a Local Authority, government department, or government service it's free for them to use. There's a lot more information than just postcodes included.

在这个列表中添加一个更实用的正则表达式,允许用户输入一个空字符串:

^$|^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,1}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$

这个正则表达式允许大写字母和小写字母,中间有可选的空格

从软件开发人员的角度来看,这个正则表达式对于地址可能是可选的软件很有用。例如,如果用户不想提供他们的地址详细信息

我一直在寻找一个英国邮政编码正则表达式的最后一天左右,无意中发现了这个线程。我尝试了上面的大部分建议,但没有一个对我有用,所以我想出了自己的正则表达式,据我所知,它捕获了截至1月13日的所有有效的英国邮政编码(根据皇家邮政的最新文献)。

The regex and some simple postcode checking PHP code is posted below. NOTE:- It allows for lower or uppercase postcodes and the GIR 0AA anomaly but to deal with the, more than likely, presence of a space in the middle of an entered postcode it also makes use of a simple str_replace to remove the space before testing against the regex. Any discrepancies beyond that and the Royal Mail themselves don't even mention them in their literature (see http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf and start reading from page 17)!

注意:在皇家邮政自己的文献中(链接以上),第3和第4位的位置略有模糊,如果这些字符是字母,则例外。我直接联系了皇家邮政,用他们自己的话说,“AANA NAA格式的出境代码的第4个位置的信件没有例外,而第3个位置的例外只适用于ANA NAA格式的出境代码的最后一个字母。”直接从马嘴里说出来的!

<?php

    $postcoderegex = '/^([g][i][r][0][a][a])$|^((([a-pr-uwyz]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[a-hk-y]{1}([0]|[1-9]\d?))|([a-pr-uwyz]{1}[1-9][a-hjkps-uw]{1})|([a-pr-uwyz]{1}[a-hk-y]{1}[1-9][a-z]{1}))(\d[abd-hjlnp-uw-z]{2})?)$/i';

    $postcode2check = str_replace(' ','',$postcode2check);

    if (preg_match($postcoderegex, $postcode2check)) {

        echo "$postcode2check is a valid postcode<br>";

    } else {

        echo "$postcode2check is not a valid postcode<br>";

    }

?>

我希望它能帮助其他遇到这条线索寻找解决方案的人。