我有一个包含XML的Java字符串,没有换行或缩进。我想把它变成一个字符串与格式良好的XML。我怎么做呢?

String unformattedXml = "<tag><nested>hello</nested></tag>";
String formattedXml = new [UnknownClass]().format(unformattedXml);

注意:我的输入是一个字符串。输出是一个字符串。

(基本)模拟结果:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag>
    <nested>hello</nested>
  </tag>
</root>

当前回答

基于这个答案的一个更简单的解决方案:

public static String prettyFormat(String input, int indent) {
    try {
        Source xmlInput = new StreamSource(new StringReader(input));
        StringWriter stringWriter = new StringWriter();
        StreamResult xmlOutput = new StreamResult(stringWriter);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
        transformerFactory.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
        Transformer transformer = transformerFactory.newTransformer(); 
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.transform(xmlInput, xmlOutput);
        return xmlOutput.getWriter().toString();
    } catch (Exception e) {
        throw new RuntimeException(e); // simple exception handling, please review it
    }
}

public static String prettyFormat(String input) {
    return prettyFormat(input, 2);
}

testcase:

prettyFormat("<root><child>aaa</child><child/></root>");

返回:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <child>aaa</child>
  <child/>
</root>

//忽略:原始编辑只需要在代码中的类名中缺少s。为了在SO上获得超过6个字符的验证,添加了多余的6个字符

其他回答

关于“您必须首先构建DOM树”的评论:不,您不需要也不应该这样做。

相反,创建一个StreamSource(new StreamSource(new StringReader(str)),并将其提供给前面提到的标识转换器。这将使用SAX解析器,结果将快得多。 在这种情况下,构建中间树纯粹是开销。 否则,排名第一的答案是好的。

这是我自己问题的答案。我将各种结果的答案结合起来,编写了一个输出XML的类。

不保证它如何响应无效的XML或大型文档。

package ecb.sdw.pretty;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import java.io.Writer;

/**
 * Pretty-prints xml, supplied as a string.
 * <p/>
 * eg.
 * <code>
 * String formattedXml = new XmlFormatter().format("<tag><nested>hello</nested></tag>");
 * </code>
 */
public class XmlFormatter {

    public XmlFormatter() {
    }

    public String format(String unformattedXml) {
        try {
            final Document document = parseXmlFile(unformattedXml);

            OutputFormat format = new OutputFormat(document);
            format.setLineWidth(65);
            format.setIndenting(true);
            format.setIndent(2);
            Writer out = new StringWriter();
            XMLSerializer serializer = new XMLSerializer(out, format);
            serializer.serialize(document);

            return out.toString();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    private Document parseXmlFile(String in) {
        try {
            DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource(new StringReader(in));
            return db.parse(is);
        } catch (ParserConfigurationException e) {
            throw new RuntimeException(e);
        } catch (SAXException e) {
            throw new RuntimeException(e);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        String unformattedXml =
                "<?xml version=\"1.0\" encoding=\"UTF-8\"?><QueryMessage\n" +
                        "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n" +
                        "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n" +
                        "    <Query>\n" +
                        "        <query:CategorySchemeWhere>\n" +
                        "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n" +
                        "        </query:CategorySchemeWhere>\n" +
                        "    </Query>\n\n\n\n\n" +
                        "</QueryMessage>";

        System.out.println(new XmlFormatter().format(unformattedXml));
    }

}

对于那些寻找快速和肮脏的解决方案的人——它不需要XML是100%有效的。例如,在REST / SOAP日志的情况下(你永远不知道其他人发送了什么;-))

我发现并改进了一个我在网上找到的代码剪辑,我认为这仍然是一个有效的可能的方法:

public static String prettyPrintXMLAsString(String xmlString) {
    /* Remove new lines */
    final String LINE_BREAK = "\n";
    xmlString = xmlString.replaceAll(LINE_BREAK, "");
    StringBuffer prettyPrintXml = new StringBuffer();
    /* Group the xml tags */
    Pattern pattern = Pattern.compile("(<[^/][^>]+>)?([^<]*)(</[^>]+>)?(<[^/][^>]+/>)?");
    Matcher matcher = pattern.matcher(xmlString);
    int tabCount = 0;
    while (matcher.find()) {
        String str1 = (null == matcher.group(1) || "null".equals(matcher.group())) ? "" : matcher.group(1);
        String str2 = (null == matcher.group(2) || "null".equals(matcher.group())) ? "" : matcher.group(2);
        String str3 = (null == matcher.group(3) || "null".equals(matcher.group())) ? "" : matcher.group(3);
        String str4 = (null == matcher.group(4) || "null".equals(matcher.group())) ? "" : matcher.group(4);

        if (matcher.group() != null && !matcher.group().trim().equals("")) {
            printTabs(tabCount, prettyPrintXml);
            if (!str1.equals("") && str3.equals("")) {
                ++tabCount;
            }
            if (str1.equals("") && !str3.equals("")) {
                --tabCount;
                prettyPrintXml.deleteCharAt(prettyPrintXml.length() - 1);
            }

            prettyPrintXml.append(str1);
            prettyPrintXml.append(str2);
            prettyPrintXml.append(str3);
            if (!str4.equals("")) {
                prettyPrintXml.append(LINE_BREAK);
                printTabs(tabCount, prettyPrintXml);
                prettyPrintXml.append(str4);
            }
            prettyPrintXml.append(LINE_BREAK);
        }
    }
    return prettyPrintXml.toString();
}

private static void printTabs(int count, StringBuffer stringBuffer) {
    for (int i = 0; i < count; i++) {
        stringBuffer.append("\t");
    }
}

public static void main(String[] args) {
    String x = new String(
            "<soap:Envelope xmlns:soap=\"http://schemas.xmlsoap.org/soap/envelope/\"><soap:Body><soap:Fault><faultcode>soap:Client</faultcode><faultstring>INVALID_MESSAGE</faultstring><detail><ns3:XcbSoapFault xmlns=\"\" xmlns:ns3=\"http://www.someapp.eu/xcb/types/xcb/v1\"><CauseCode>20007</CauseCode><CauseText>INVALID_MESSAGE</CauseText><DebugInfo>Problems creating SAAJ object model</DebugInfo></ns3:XcbSoapFault></detail></soap:Fault></soap:Body></soap:Envelope>");
    System.out.println(prettyPrintXMLAsString(x));
}

输出如下:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>INVALID_MESSAGE</faultstring>
        <detail>
            <ns3:XcbSoapFault xmlns="" xmlns:ns3="http://www.someapp.eu/xcb/types/xcb/v1">
                <CauseCode>20007</CauseCode>
                <CauseText>INVALID_MESSAGE</CauseText>
                <DebugInfo>Problems creating SAAJ object model</DebugInfo>
            </ns3:XcbSoapFault>
        </detail>
    </soap:Fault>
  </soap:Body>
</soap:Envelope>

如果你不需要缩进那么多,但一些换行,这可能是足够的简单regex…

String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");

代码很好,而不是因为缺少缩进而导致的结果。


(对于有缩进的解,请参见其他答案。)

I have found that in Java 1.6.0_32 the normal method to pretty print an XML string (using a Transformer with a null or identity xslt) does not behave as I would like if tags are merely separated by whitespace, as opposed to having no separating text. I tried using <xsl:strip-space elements="*"/> in my template to no avail. The simplest solution I found was to strip the space the way I wanted using a SAXSource and XML filter. Since my solution was for logging I also extended this to work with incomplete XML fragments. Note the normal method seems to work fine if you use a DOMSource but I did not want to use this because of the incompleteness and memory overhead.

public static class WhitespaceIgnoreFilter extends XMLFilterImpl
{

    @Override
    public void ignorableWhitespace(char[] arg0,
                                    int arg1,
                                    int arg2) throws SAXException
    {
        //Ignore it then...
    }

    @Override
    public void characters( char[] ch,
                            int start,
                            int length) throws SAXException
    {
        if (!new String(ch, start, length).trim().equals("")) 
               super.characters(ch, start, length); 
    }
}

public static String prettyXML(String logMsg, boolean allowBadlyFormedFragments) throws SAXException, IOException, TransformerException
    {
        TransformerFactory transFactory = TransformerFactory.newInstance();
        transFactory.setAttribute("indent-number", new Integer(2));
        Transformer transformer = transFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        StringWriter out = new StringWriter();
        XMLReader masterParser = SAXHelper.getSAXParser(true);
        XMLFilter parser = new WhitespaceIgnoreFilter();
        parser.setParent(masterParser);

        if(allowBadlyFormedFragments)
        {
            transformer.setErrorListener(new ErrorListener()
            {
                @Override
                public void warning(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void fatalError(TransformerException exception) throws TransformerException
                {
                }

                @Override
                public void error(TransformerException exception) throws TransformerException
                {
                }
            });
        }

        try
        {
            transformer.transform(new SAXSource(parser, new InputSource(new StringReader(logMsg))), new StreamResult(out));
        }
        catch (TransformerException e)
        {
            if(e.getCause() != null && e.getCause() instanceof SAXParseException)
            {
                if(!allowBadlyFormedFragments || !"XML document structures must start and end within the same entity.".equals(e.getCause().getMessage()))
                {
                    throw e;
                }
            }
            else
            {
                throw e;
            }
        }
        out.flush();
        return out.toString();
    }