如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
如果您有java.io.InputStream对象,应该如何处理该对象并生成String?
假设我有一个包含文本数据的InputStream,我想将其转换为String,例如,我可以将其写入日志文件。
获取InputStream并将其转换为字符串的最简单方法是什么?
public String convertStreamToString(InputStream is) {
// ???
}
当前回答
在Groovy中
inputStream.getText()
其他回答
下面的代码对我有用。
URL url = MyClass.class.getResource("/" + configFileName);
BufferedInputStream bi = (BufferedInputStream) url.getContent();
byte[] buffer = new byte[bi.available() ];
int bytesRead = bi.read(buffer);
String out = new String(buffer);
请注意,根据Java文档,available()方法可能不适用于InputStream,但始终适用于BufferedInputStream。如果您不想使用available()方法,我们可以始终使用以下代码
URL url = MyClass.class.getResource("/" + configFileName);
BufferedInputStream bi = (BufferedInputStream) url.getContent();
File f = new File(url.getPath());
byte[] buffer = new byte[ (int) f.length()];
int bytesRead = bi.read(buffer);
String out = new String(buffer);
我不确定是否会有任何编码问题。如果代码有任何问题,请发表评论。
这个很好,因为:
它可以安全地处理Charset。您可以控制读取缓冲区的大小。您可以设置生成器的长度,而不必是精确的值。不受库依赖关系的影响。适用于Java 7或更高版本。
怎么做?
public static String convertStreamToString(InputStream is) throws IOException {
StringBuilder sb = new StringBuilder(2048); // Define a size if you have an idea of it.
char[] read = new char[128]; // Your buffer size.
try (InputStreamReader ir = new InputStreamReader(is, StandardCharsets.UTF_8)) {
for (int i; -1 != (i = ir.read(read)); sb.append(read, 0, i));
}
return sb.toString();
}
对于JDK 9
public static String inputStreamString(InputStream inputStream) throws IOException {
try (inputStream) {
return new String(inputStream.readAllBytes(), StandardCharsets.UTF_8);
}
}
异-8859-1
如果您知道输入流的编码是ISO-8859-1或ASCII,这里有一种非常高效的方法来实现这一点。它(1)避免了StringWriter的内部StringBuffer中存在的不必要的同步,(2)避免了InputStreamReader的开销,(3)最小化了必须复制StringBuilder的内部字符数组的次数。
public static String iso_8859_1(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
chars.append((char)(buffer[i] & 0xFF));
}
}
return chars.toString();
}
UTF-8型
对于使用UTF-8编码的流,可以使用相同的通用策略:
public static String utf8(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) >= 0) {
chars.appendCodePoint(state);
} else if (state == -1) { //error
state = 0;
chars.append('\uFFFD'); //replacement char
}
}
}
return chars.toString();
}
其中nextStateUtf8()函数定义如下:
/**
* Returns the next UTF-8 state given the next byte of input and the current state.
* If the input byte is the last byte in a valid UTF-8 byte sequence,
* the returned state will be the corresponding unicode character (in the range of 0 through 0x10FFFF).
* Otherwise, a negative integer is returned. A state of -1 is returned whenever an
* invalid UTF-8 byte sequence is detected.
*/
static int nextStateUtf8(int currentState, byte nextByte) {
switch (currentState & 0xF0000000) {
case 0:
if ((nextByte & 0x80) == 0) { //0 trailing bytes (ASCII)
return nextByte;
} else if ((nextByte & 0xE0) == 0xC0) { //1 trailing byte
if (nextByte == (byte) 0xC0 || nextByte == (byte) 0xC1) { //0xCO & 0xC1 are overlong
return -1;
} else {
return nextByte & 0xC000001F;
}
} else if ((nextByte & 0xF0) == 0xE0) { //2 trailing bytes
if (nextByte == (byte) 0xE0) { //possibly overlong
return nextByte & 0xA000000F;
} else if (nextByte == (byte) 0xED) { //possibly surrogate
return nextByte & 0xB000000F;
} else {
return nextByte & 0x9000000F;
}
} else if ((nextByte & 0xFC) == 0xF0) { //3 trailing bytes
if (nextByte == (byte) 0xF0) { //possibly overlong
return nextByte & 0x80000007;
} else {
return nextByte & 0xE0000007;
}
} else if (nextByte == (byte) 0xF4) { //3 trailing bytes, possibly undefined
return nextByte & 0xD0000007;
} else {
return -1;
}
case 0xE0000000: //3rd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x80000000: //3rd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 || (nextByte & 0xF0) == 0x90 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0xD0000000: //3rd-to-last continuation byte, check undefined
return (nextByte & 0xF0) == 0x80 ? currentState << 6 | nextByte & 0x9000003F : -1;
case 0x90000000: //2nd-to-last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xA0000000: //2nd-to-last continuation byte, check overlong
return (nextByte & 0xE0) == 0xA0 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xB0000000: //2nd-to-last continuation byte, check surrogate
return (nextByte & 0xE0) == 0x80 ? currentState << 6 | nextByte & 0xC000003F : -1;
case 0xC0000000: //last continuation byte
return (nextByte & 0xC0) == 0x80 ? currentState << 6 | nextByte & 0x3F : -1;
default:
return -1;
}
}
自动检测编码
如果您的输入流是使用ASCII、ISO-8859-1或UTF-8编码的,但您不确定是哪一种,我们可以使用与上一种方法类似的方法,但使用额外的编码检测组件在返回字符串之前自动检测编码。
public static String autoDetect(InputStream is) throws IOException {
StringBuilder chars = new StringBuilder(Math.max(is.available(), 4096));
byte[] buffer = new byte[4096];
int n;
int state = 0;
boolean ascii = true;
while ((n = is.read(buffer)) != -1) {
for (int i = 0; i < n; i++) {
if ((state = nextStateUtf8(state, buffer[i])) > 0x7F)
ascii = false;
chars.append((char)(buffer[i] & 0xFF));
}
}
if (ascii || state < 0) { //probably not UTF-8
return chars.toString();
}
//probably UTF-8
int pos = 0;
char[] charBuf = new char[2];
for (int i = 0, len = chars.length(); i < len; i++) {
if ((state = nextStateUtf8(state, (byte)chars.charAt(i))) >= 0) {
boolean hi = Character.toChars(state, charBuf, 0) == 2;
chars.setCharAt(pos++, charBuf[0]);
if (hi) {
chars.setCharAt(pos++, charBuf[1]);
}
}
}
return chars.substring(0, pos);
}
如果您的输入流的编码既不是ISO-8859-1,也不是ASCII,也不是UTF-8,那么我就遵从已经存在的其他答案。
这里有一种仅使用标准Java库的方法(请注意,流没有关闭,您的里程可能会有所不同)。
static String convertStreamToString(java.io.InputStream is) {
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
我从“愚蠢的扫描仪技巧”一文中学会了这个技巧。它工作的原因是因为Scanner迭代流中的令牌,在这种情况下,我们使用“输入边界的开始”(\A)来分离令牌,从而为流的整个内容只提供一个令牌。
注意,如果您需要明确输入流的编码,可以向Scanner构造函数提供第二个参数,指示要使用的字符集(例如“UTF-8”)。
雅各布也收到了帽子提示,他曾向我指出了上述文章。
Use:
import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.io.IOException;
public static String readInputStreamAsString(InputStream in)
throws IOException {
BufferedInputStream bis = new BufferedInputStream(in);
ByteArrayOutputStream buf = new ByteArrayOutputStream();
int result = bis.read();
while(result != -1) {
byte b = (byte)result;
buf.write(b);
result = bis.read();
}
return buf.toString();
}