Java中文件中的行数

我使用巨大的数据文件，有时我只需要知道这些文件中的行数，通常我打开它们，一行一行地读取它们，直到我到达文件的末尾

我在想有没有更聪明的办法

当前回答

这是我迄今为止发现的最快的版本，大约比readLines快6倍。对于150MB的日志文件，这需要0.35秒，而在使用readLines()时需要2.40秒。只是为了好玩，linux的wc -l命令需要0.15秒。

public static int countLinesOld(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

编辑，9年半后:我几乎没有java经验，但无论如何，我试图将这段代码与下面的LineNumberReader解决方案进行基准测试，因为没有人这样做让我感到困扰。似乎对于大文件，我的解决方案更快。虽然它似乎需要几次运行，直到优化器做一个像样的工作。我已经玩了一些代码，并产生了一个新版本，始终是最快的:

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        
        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }
        
        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i=0; i<1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }
        
        // count remaining characters
        while (readChars != -1) {
            for (int i=0; i<readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }
        
        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

1.3GB文本文件的基准测试结果，y轴以秒为单位。我已经对同一个文件执行了100次运行，并使用System.nanoTime()对每次运行进行了测量。您可以看到countLinesOld有一些异常值，而countLinesNew没有异常值，虽然它只是稍微快一点，但差异在统计上是显著的。LineNumberReader显然更慢。

2009-01-17 09:35:17

其他回答

/**
 * Count file rows.
 *
 * @param file file
 * @return file row count
 * @throws IOException
 */
public static long getLineCount(File file) throws IOException {

    try (Stream<String> lines = Files.lines(file.toPath())) {
        return lines.count();
    }
}

在JDK8_u31上测试。但与此方法相比，性能确实较慢:

/**
 * Count file rows.
 *
 * @param file file
 * @return file row count
 * @throws IOException
 */
public static long getLineCount(File file) throws IOException {

    try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {

        byte[] c = new byte[1024];
        boolean empty = true,
                lastEmpty = false;
        long count = 0;
        int read;
        while ((read = is.read(c)) != -1) {
            for (int i = 0; i < read; i++) {
                if (c[i] == '\n') {
                    count++;
                    lastEmpty = true;
                } else if (lastEmpty) {
                    lastEmpty = false;
                }
            }
            empty = false;
        }

        if (!empty) {
            if (count == 0) {
                count = 1;
            } else if (!lastEmpty) {
                count++;
            }
        }

        return count;
    }
}

经过测试，非常快。

2015-02-20 22:13:37

如果你用这个

public int countLines(String filename) throws IOException {
    LineNumberReader reader  = new LineNumberReader(new FileReader(filename));
    int cnt = 0;
    String lineRead = "";
    while ((lineRead = reader.readLine()) != null) {}

    cnt = reader.getLineNumber(); 
    reader.close();
    return cnt;
}

你不能运行到大num行，比如100K行，因为从读取器返回。getLineNumber是int。你需要长类型的数据来处理最多的行。

2010-12-13 03:23:15

我知道这是一个老问题，但公认的解决方案并不完全符合我所需要的。因此，我将其改进为接受各种行结束符(而不仅仅是换行)并使用指定的字符编码(而不是ISO-8859-n)。所有在一个方法(适当重构):

public static long getLinesCount(String fileName, String encodingName) throws IOException {
    long linesCount = 0;
    File file = new File(fileName);
    FileInputStream fileIn = new FileInputStream(file);
    try {
        Charset encoding = Charset.forName(encodingName);
        Reader fileReader = new InputStreamReader(fileIn, encoding);
        int bufferSize = 4096;
        Reader reader = new BufferedReader(fileReader, bufferSize);
        char[] buffer = new char[bufferSize];
        int prevChar = -1;
        int readCount = reader.read(buffer);
        while (readCount != -1) {
            for (int i = 0; i < readCount; i++) {
                int nextChar = buffer[i];
                switch (nextChar) {
                    case '\r': {
                        // The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
                        linesCount++;
                        break;
                    }
                    case '\n': {
                        if (prevChar == '\r') {
                            // The current line is terminated by a carriage return immediately followed by a line feed.
                            // The line has already been counted.
                        } else {
                            // The current line is terminated by a line feed.
                            linesCount++;
                        }
                        break;
                    }
                }
                prevChar = nextChar;
            }
            readCount = reader.read(buffer);
        }
        if (prevCh != -1) {
            switch (prevCh) {
                case '\r':
                case '\n': {
                    // The last line is terminated by a line terminator.
                    // The last line has already been counted.
                    break;
                }
                default: {
                    // The last line is terminated by end-of-file.
                    linesCount++;
                }
            }
        }
    } finally {
        fileIn.close();
    }
    return linesCount;
}

这个解决方案在速度上与公认的解决方案相当，在我的测试中大约慢了4%(尽管Java中的计时测试是出了名的不可靠)。

2012-09-21 20:27:57

在java-8中，你可以使用流:

try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
  long numOfLines = lines.count();
  ...
}

2013-07-25 19:07:54

如果没有任何索引结构，就无法读取完整的文件。但是您可以通过避免逐行读取并使用正则表达式来匹配所有行结束符来优化它。

2009-01-17 09:36:41

Java中文件中的行数

推荐文章

最新文章

标签