从文本JavaScript中删除HTML

有没有一种简单的方法可以在JavaScript中获取一个html字符串并去掉html？

当前回答

function strip_html_tags(str)
{
   if ((str===null) || (str===''))
       return false;
  else
   str = str.toString();
  return str.replace(/<[^>]*>/g, '');
}

2018-07-04 21:59:23

其他回答

将HTML转换为纯文本电子邮件，保持超链接（a href）完整

hypoxide发布的上述功能运行良好，但我所追求的是基本上转换在WebRichText编辑器（例如FCKEditor）中创建的HTML并清除所有HTML，但保留所有链接，因为我希望HTML和纯文本版本都能帮助创建STMP电子邮件的正确部分（HTML和纯文字）。

经过长时间的谷歌搜索，我和我的同事使用Javascript中的正则表达式引擎得出了这个结论：

str='this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1<br><p>Now back to normal text and stuff</p>
';
str=str.replace(/<br>/gi, "\n");
str=str.replace(/<p.*>/gi, "\n");
str=str.replace(/<a.*href="(.*?)".*>(.*?)<\/a>/gi, " $2 (Link->$1) ");
str=str.replace(/<(?:.|\s)*?>/g, "");

str变量的开头如下：

this string has <i>html</i> code i want to <b>remove</b><br>Link Number 1 -><a href="http://www.bbc.co.uk">BBC</a> Link Number 1<br><p>Now back to normal text and stuff</p>

然后在代码运行之后，它看起来像这样：-

this string has html code i want to remove
Link Number 1 -> BBC (Link->http://www.bbc.co.uk)  Link Number 1


Now back to normal text and stuff

正如你所看到的，所有HTML都被删除了，链接也被保留了下来，超链接文本仍然完好无损。此外，我还将＜p＞和＜br＞标记替换为\n（换行符），以便保留某种视觉格式。

更改链接格式（例如，BBC（链接->http://www.bbc.co.uk)）只需编辑$2（Link->$1），其中$1是href URL/URI，$2是超链接文本。由于链接直接位于纯文本正文中，大多数SMTP邮件客户端都会转换这些链接，以便用户能够单击它们。

希望你觉得这很有用。

2009-08-06 08:30:22

这应该可以在任何Javascript环境（包括NodeJS）上完成工作。

    const text = `
    <html lang="en">
      <head>
        <style type="text/css">*{color:red}</style>
        <script>alert('hello')</script>
      </head>
      <body><b>This is some text</b><br/><body>
    </html>`;
    
    // Remove style tags and content
    text.replace(/<style[^>]*>.*<\/style>/gm, '')
        // Remove script tags and content
        .replace(/<script[^>]*>.*<\/script>/gm, '')
        // Remove all opening, closing and orphan HTML tags
        .replace(/<[^>]+>/gm, '')
        // Remove leading spaces and repeated CR/LF
        .replace(/([\r\n]+ +)+/gm, '');

2017-01-20 05:49:54

很多人已经回答了这个问题，但我认为分享我编写的函数可能会有用，该函数可以从字符串中删除HTML标记，但允许您包含一个不希望删除的标记数组。它很短，对我来说一直很好。

function removeTags(string, array){
  return array ? string.split("<").filter(function(val){ return f(array, val); }).map(function(val){ return f(array, val); }).join("") : string.split("<").map(function(d){ return d.split(">").pop(); }).join("");
  function f(array, value){
    return array.map(function(d){ return value.includes(d + ">"); }).indexOf(true) != -1 ? "<" + value : value.split(">")[1];
  }
}

var x = "<span><i>Hello</i> <b>world</b>!</span>";
console.log(removeTags(x)); // Hello world!
console.log(removeTags(x, ["span", "i"])); // <span><i>Hello</i> world!</span>

2017-01-27 06:55:53

用jQuery剥离html的一种更安全的方法是，首先使用jQuery.parseHTML创建DOM，忽略任何脚本，然后让jQuery构建元素，然后仅检索文本。

function stripHtml(unsafe) {
    return $($.parseHTML(unsafe)).text();
}

可以安全地从以下位置剥离html：

<img src="unknown.gif" onerror="console.log('running injections');">

以及其他漏洞。

nJoy！

2019-03-25 20:44:36

方法1：

function cleanHTML(str){
  str.replace(/<(?<=<)(.*?)(?=>)>/g, '&lt;$1&gt;');
}

function uncleanHTML(str){
  str.replace(/&lt;(?<=&lt;)(.*?)(?=&gt;)&gt;/g, '<$1>');
}

方法2：

function cleanHTML(str){
  str.replace(/</g, '&lt;').replace(/>/g, '&gt;');
}

function uncleanHTML(str){
  str.replace(/&lt;/g, '<').replace(/&gt;/g, '>');
}

此外，不要忘记，如果用户碰巧发布了一条数学评论（例如：1<2），您不想删除整个评论。浏览器（仅测试了chrome）不将unicode作为html标记运行。如果将所有<替换为&lt；字符串中的每一个文件，unicode都将显示<为文本，而不运行任何html。我推荐方法2。jquery也能很好地工作$（'#element'）.text（）；

2019-12-14 21:28:33

从文本JavaScript中删除HTML

推荐文章

最新文章

标签