如何以最有效的内存和时间方式获取大文件的行数?

def file_len(filename):
    with open(filename) as f:
        for i, _ in enumerate(f):
            pass
    return i + 1

当前回答

我得到了一个小(4-8%)的改进,这个版本重用了一个常量缓冲区,所以它应该避免任何内存或GC开销:

lines = 0
buffer = bytearray(2048)
with open(filename) as f:
  while f.readinto(buffer) > 0:
      lines += buffer.count('\n')

您可以调整缓冲区大小,可能会看到一些改进。

其他回答

您可以执行子进程并运行wc -l filename

import subprocess

def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])
def file_len(full_path):
  """ Count number of lines in a file."""
  f = open(full_path)
  nr_of_lines = sum(1 for line in f)
  f.close()
  return nr_of_lines

这段代码更短、更清晰。这可能是最好的方法:

num_lines = open('yourfile.ext').read().count('\n')

我发现你可以。

f = open("data.txt")
linecout = len(f.readlines())

会给你答案吗

def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count