从字符串中提取主机名

我想匹配的只是一个URL的根，而不是一个文本字符串的整个URL。考虑到:

http://www.youtube.com/watch?v=ClkQA2Lb_iE
http://youtu.be/ClkQA2Lb_iE
http://www.example.com/12xy45
http://example.com/random

我想让最后2个实例解析到www.example.com或example.com域。

我听说正则表达式很慢，这将是我在页面上的第二个正则表达式，所以如果有办法做到没有正则表达式，请告诉我。

我正在寻找这个解决方案的JS/jQuery版本。

当前回答

尝试下面的代码为精确的域名使用正则表达式，

字符串line = "http://www.youtube.com/watch?v=ClkQA2Lb_iE";

  String pattern3="([\\w\\W]\\.)+(.*)?(\\.[\\w]+)";

  Pattern r = Pattern.compile(pattern3);


  Matcher m = r.matcher(line);
  if (m.find( )) {

    System.out.println("Found value: " + m.group(2) );
  } else {
     System.out.println("NO MATCH");
  }

2016-04-25 10:11:55

其他回答

我的代码是这样的。正则表达式可以有很多种形式，下面是我的测试用例我认为它更具可扩展性。

function extractUrlInfo(url){ let reg = /^((?<protocol>http[s]?):\/\/)?(?<host>((\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])\.(\d{1,2}|1\d\d|2[0-4]\d|25[0-5])|[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)))(\:(?<port>[0-9]|[1-9]\d|[1-9]\d{2}|[1-9]\d{3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))?$/ return reg.exec(url).groups } var url = "https://192.168.1.1:1234" console.log(extractUrlInfo(url)) var url = "https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string" console.log(extractUrlInfo(url))

2020-02-27 07:29:31

简单来说，你可以这样做

var url = "http://www.someurl.com/support/feature"

function getDomain(url){
  domain=url.split("//")[1];
  return domain.split("/")[0];
}
eg:
  getDomain("http://www.example.com/page/1")

  output:
   "www.example.com"

使用上述函数获取域名

2016-05-17 13:39:27

尝试下面的代码为精确的域名使用正则表达式，

字符串line = "http://www.youtube.com/watch?v=ClkQA2Lb_iE";

  String pattern3="([\\w\\W]\\.)+(.*)?(\\.[\\w]+)";

  Pattern r = Pattern.compile(pattern3);


  Matcher m = r.matcher(line);
  if (m.find( )) {

    System.out.println("Found value: " + m.group(2) );
  } else {
     System.out.println("NO MATCH");
  }

2016-04-25 10:11:55

我给你3个可能的解决方案:

使用npm包psl提取你扔给它的任何东西。使用我的自定义实现extractRootDomain，它适用于大多数情况。网址(URL)。主机名是可行的，但并非适用于所有边缘情况。点击“运行代码段”查看它是如何运行的。

1. 使用npm包psl(公共后缀列表)

“公共后缀列表”是所有有效域名后缀和规则的列表，不仅是国家代码顶级域名，还包括被视为根域的unicode字符(即www.食狮.公司.cn, b.c.a bebe .jp等)。点击这里阅读更多信息。

Try:

npm install --save psl

然后用我的“extractHostname”实现运行:

let psl = require('psl');
let url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
psl.get(extractHostname(url)); // returns youtube.com

2. extractRootDomain的自定义实现

下面是我的实现，它还针对各种可能的URL输入运行。

无论是否有协议或端口号，您都可以提取域。这是一个非常简化的，非正则表达式的解，所以我认为这可以解决我们在问题中提供的数据集。

3. 网址(URL) hostname

网址(URL)。主机名是一个有效的解决方案，但它不适用于我已经解决的一些边缘情况。正如您在上次测试中看到的，它不喜欢某些url。你绝对可以使用我的解决方案的组合来让它全部工作。

*感谢@Timmerz， @renoirb， @rineez， @BigDong， @ra00l， @ILikeBeansTacos， @CharlesRobertson的建议!@ross-allen，谢谢你报告这个bug!

2014-05-30 00:06:20

有两个很好的解决方案，这取决于你是否需要优化性能(并且没有外部依赖!):

1. 使用URL。便于阅读的主机名

最简洁和最简单的解决方案是使用URL.hostname。

getHostname = (url) => { //使用URL构造函数并返回主机名返回新URL(URL).hostname; ｝ / /测试 console.log (getHostname (" https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/ ")); console.log (getHostname (" https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname "));

URL。主机名是URL API的一部分，除IE (caniuse)之外的所有主流浏览器都支持。如果需要支持旧浏览器，请使用URL填充。

额外的好处:使用URL构造函数还可以让你访问其他URL属性和方法!

2. 使用RegEx来提高性能

URL。对于大多数用例，主机名应该是您的选择。然而，它仍然比这个正则表达式慢得多(你自己在jsPerf上测试):

const getHostnameFromRegex = (url) => { //运行正则表达式 const匹配= url.match (/ ^ https ?\:\/\/([^\/?#]+)(?:[\/?#]|$)/ 我); //提取主机名(如果没有匹配则为空) 返回匹配&&匹配[1]; ｝ / /测试 console.log (getHostnameFromRegex (" https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/ ")); console.log (getHostnameFromRegex (" https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname "));

博士TL;

你应该使用URL.hostname。如果您需要处理大量的url(其中性能是一个因素)，请考虑RegEx。

2019-03-01 15:33:37

从字符串中提取主机名

推荐文章

最新文章

标签