我使用巨大的数据文件,有时我只需要知道这些文件中的行数,通常我打开它们,一行一行地读取它们,直到我到达文件的末尾
我在想有没有更聪明的办法
我使用巨大的数据文件,有时我只需要知道这些文件中的行数,通常我打开它们,一行一行地读取它们,直到我到达文件的末尾
我在想有没有更聪明的办法
当前回答
我已经实现了这个问题的另一个解决方案,我发现它在计算行数时更有效:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}
其他回答
似乎有几种不同的方法可以使用LineNumberReader。
我是这样做的:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
更简单的是,可以使用Java BufferedReader lines()方法返回元素流,然后使用stream count()方法对所有元素进行计数。然后只需在输出中添加1,就可以得到文本文件中的行数。
为例:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);
EOF处没有换行符('\n')的多行文件的最佳优化代码。
/**
*
* @param filename
* @return
* @throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}
如果你用这个
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
你不能运行到大num行,比如100K行,因为从读取器返回。getLineNumber是int。你需要长类型的数据来处理最多的行。
/**
* Count file rows.
*
* @param file file
* @return file row count
* @throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
在JDK8_u31上测试。但与此方法相比,性能确实较慢:
/**
* Count file rows.
*
* @param file file
* @return file row count
* @throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
经过测试,非常快。
上面的count()方法给出的答案是,如果文件末尾没有换行符,就会出现行数错误——它无法计算文件中的最后一行。
这个方法更适合我:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}