从文本JavaScript中删除HTML

有没有一种简单的方法可以在JavaScript中获取一个html字符串并去掉html？

当前回答

const strip=(text) =>{
    return (new DOMParser()?.parseFromString(text,"text/html"))
    ?.body?.textContent
}

const value=document.getElementById("idOfEl").value

const cleanText=strip(value)

2022-01-19 08:53:18

其他回答

下面的代码允许您保留一些html标记，同时剥离所有其他标记

function strip_tags(input, allowed) {

  allowed = (((allowed || '') + '')
    .toLowerCase()
    .match(/<[a-z][a-z0-9]*>/g) || [])
    .join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)

  var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
      commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;

  return input.replace(commentsAndPhpTags, '')
      .replace(tags, function($0, $1) {
          return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
      });
}

2015-07-14 12:56:53

我认为最简单的方法就是像上面提到的那样使用正则表达式。虽然没有理由使用一堆。尝试：

stringWithHTML = stringWithHTML.replace(/<\/?[a-z][a-z0-9]*[^<>]*>/ig, "");

2011-01-10 05:40:34

对于转义字符，也可以使用模式匹配：

myString.replace(/((&lt)|(<)(?:.|\n)*?(&gt)|(>))/gm, '');

2016-11-08 10:44:34

还可以使用出色的htmlparser2纯JSHTML解析器。这里是一个工作演示：

var htmlparser = require('htmlparser2');

var body = '<p><div>This is </div>a <span>simple </span> <img src="test"></img>example.</p>';

var result = [];

var parser = new htmlparser.Parser({
    ontext: function(text){
        result.push(text);
    }
}, {decodeEntities: true});

parser.write(body);
parser.end();

result.join('');

输出将是这是一个简单的示例。

请在此处查看实际操作：https://tonicdev.com/jfahrenkrug/extract-text-from-html

如果您使用类似webpack的工具打包web应用程序，则这在节点和浏览器中都有效。

2015-12-29 19:11:59

这个包非常适合剥离HTML：https://www.npmjs.com/package/string-strip-html

它可以在浏览器和服务器（例如Node.js）上工作。

2021-07-11 08:13:33

从文本JavaScript中删除HTML

推荐文章

最新文章

标签