从文本JavaScript中删除HTML

有没有一种简单的方法可以在JavaScript中获取一个html字符串并去掉html？

当前回答

另一个公认不如nickf或Shog9优雅的解决方案是从＜body＞标记开始递归遍历DOM并附加每个文本节点。

var bodyContent = document.getElementsByTagName('body')[0];
var result = appendTextNodes(bodyContent);

function appendTextNodes(element) {
    var text = '';

    // Loop through the childNodes of the passed in element
    for (var i = 0, len = element.childNodes.length; i < len; i++) {
        // Get a reference to the current child
        var node = element.childNodes[i];
        // Append the node's value if it's a text node
        if (node.nodeType == 3) {
            text += node.nodeValue;
        }
        // Recurse through the node's children, if there are any
        if (node.childNodes.length > 0) {
            appendTextNodes(node);
        }
    }
    // Return the final result
    return text;
}

2009-05-04 23:14:30

其他回答

这应该可以在任何Javascript环境（包括NodeJS）上完成工作。

    const text = `
    <html lang="en">
      <head>
        <style type="text/css">*{color:red}</style>
        <script>alert('hello')</script>
      </head>
      <body><b>This is some text</b><br/><body>
    </html>`;
    
    // Remove style tags and content
    text.replace(/<style[^>]*>.*<\/style>/gm, '')
        // Remove script tags and content
        .replace(/<script[^>]*>.*<\/script>/gm, '')
        // Remove all opening, closing and orphan HTML tags
        .replace(/<[^>]+>/gm, '')
        // Remove leading spaces and repeated CR/LF
        .replace(/([\r\n]+ +)+/gm, '');

2017-01-20 05:49:54

    (function($){
        $.html2text = function(html) {
            if($('#scratch_pad').length === 0) {
                $('<div id="lh_scratch"></div>').appendTo('body');  
            }
            return $('#scratch_pad').html(html).text();
        };

    })(jQuery);

将其定义为jquery插件，并按如下方式使用：

$.html2text(htmlContent);

2012-03-16 06:25:57

如果您不想为此创建DOM（可能您不在浏览器上下文中），可以使用striptags npm包。

import striptags from 'striptags'; //ES6 <-- pick one
const striptags = require('striptags'); //ES5 <-- pick one

striptags('<p>An HTML string</p>');

2021-07-05 09:31:20

作为jQuery方法的扩展，如果字符串可能不包含HTML（例如，如果您试图从表单字段中删除HTML）

jQuery(html).text();

如果没有HTML，将返回空字符串

Use:

jQuery('<p>' + html + '</p>').text();

相反

更新：正如评论中所指出的，在某些情况下，如果攻击者可能影响html的值，则此解决方案将执行html中包含的javascript，请使用不同的解决方案。

2013-01-15 12:20:49

对于转义字符，也可以使用模式匹配：

myString.replace(/((&lt)|(<)(?:.|\n)*?(&gt)|(>))/gm, '');

2016-11-08 10:44:34

从文本JavaScript中删除HTML

推荐文章

最新文章

标签