从文件中获取第n行的Bash工具

有没有一种“规范”的方法？我一直在使用head-n|tail-1来完成这个任务，但我一直在想是否有一个Bash工具专门从文件中提取一行（或一系列行）。

我所说的“规范”是指一个主要功能就是这样做的程序。

当前回答

已经有很多好答案了。我个人喜欢awk。为了方便起见，如果您使用bash，只需将以下内容添加到~/.bash_profile中即可。下次登录时（或者如果您在本次更新后获取.bash_profile的源代码），您将有一个新的漂亮的“第n”函数可用于管道传输文件。

执行此命令或将其放入~/.bash_profile（如果使用bash）并重新打开bash（或执行源~/.bach_profile）

# print just the nth piped in line
nth () { awk -vlnum=${1} 'NR==lnum {print; exit}'; }

然后，要使用它，只需通过管道。例如：

$ yes line | cat -n | nth 5
     5  line

2017-11-17 15:42:57

其他回答

执行此命令或将其放入~/.bash_profile（如果使用bash）并重新打开bash（或执行源~/.bach_profile）

# print just the nth piped in line
nth () { awk -vlnum=${1} 'NR==lnum {print; exit}'; }

然后，要使用它，只需通过管道。例如：

$ yes line | cat -n | nth 5
     5  line

2017-11-17 15:42:57

您也可以使用sed打印并退出：

sed -n '10{p;q;}' file   # print line 10

2011-05-17 11:49:24

对于一个巨大的文件来说，头部和尾部的管道会很慢。我建议您这样做：

sed 'NUMq;d' file

其中NUM是要打印的行数；例如，sed’10q；d'文件以打印文件的第10行。

说明：

当行号为NUM时，NUMq将立即退出。

d将删除该行而不是打印该行；这在最后一行被禁止，因为q会导致退出时跳过脚本的其余部分。

如果变量中有NUM，则需要使用双引号而不是单引号：

sed "${NUM}q;d" file

2011-05-16 19:38:33

根据我的测试，就性能和可读性而言，我的建议是：

尾部-n+n|头部-1

N是您想要的行号。例如，tail-n+7 input.txt | head-1将打印文件的第7行。

tail-n+n将打印从第n行开始的所有内容，head-1将使其在一行之后停止。

可选的head-N|tail-1可能更可读。例如，这将打印第7行：

head-7 input.txt | tail-1

当谈到性能时，较小的文件大小没有太大的差异，但当文件变大时，尾部|头部（从上方）的性能会优于尾部|头部。

排名靠前的是“NUMq；d’很有意思，但我认为，与头/尾解决方案相比，开箱即用的人更少，而且它也比尾/头慢。

在我的测试中，两个尾部/头部版本都优于sed的NUMq；d’一致。这与发布的其他基准一致。很难找到尾巴/脑袋真的很坏的案例。这也不奇怪，因为这些操作在现代Unix系统中会被大量优化。

为了了解性能差异，以下是我从一个巨大文件（9.3G）中得到的数字：

tail-n+n | head-1:3.7秒头-N|尾-1:4.6秒sed Nq；d： 18.8秒

结果可能有所不同，但总体而言，性能头部|尾部和尾部|头部对于较小的输入来说是可比的，sed总是慢了一个重要因素（大约5倍左右）。

要复制我的基准测试，您可以尝试以下操作，但请注意，它将在当前工作目录中创建一个9.3G文件：

#!/bin/bash
readonly file=tmp-input.txt
readonly size=1000000000
readonly pos=500000000
readonly retries=3

seq 1 $size > $file
echo "*** head -N | tail -1 ***"
for i in $(seq 1 $retries) ; do
    time head "-$pos" $file | tail -1
done
echo "-------------------------"
echo
echo "*** tail -n+N | head -1 ***"
echo

seq 1 $size > $file
ls -alhg $file
for i in $(seq 1 $retries) ; do
    time tail -n+$pos $file | head -1
done
echo "-------------------------"
echo
echo "*** sed Nq;d ***"
echo

seq 1 $size > $file
ls -alhg $file
for i in $(seq 1 $retries) ; do
    time sed $pos'q;d' $file
done
/bin/rm $file

这是在我的机器上运行的输出（ThinkPad X1 Carbon，带有SSD和16G内存）。我假设在最后一次运行中，所有内容都将来自缓存，而不是磁盘：

*** head -N | tail -1 ***
500000000

real    0m9,800s
user    0m7,328s
sys     0m4,081s
500000000

real    0m4,231s
user    0m5,415s
sys     0m2,789s
500000000

real    0m4,636s
user    0m5,935s
sys     0m2,684s
-------------------------

*** tail -n+N | head -1 ***

-rw-r--r-- 1 phil 9,3G Jan 19 19:49 tmp-input.txt
500000000

real    0m6,452s
user    0m3,367s
sys     0m1,498s
500000000

real    0m3,890s
user    0m2,921s
sys     0m0,952s
500000000

real    0m3,763s
user    0m3,004s
sys     0m0,760s
-------------------------

*** sed Nq;d ***

-rw-r--r-- 1 phil 9,3G Jan 19 19:50 tmp-input.txt
500000000

real    0m23,675s
user    0m21,557s
sys     0m1,523s
500000000

real    0m20,328s
user    0m18,971s
sys     0m1,308s
500000000

real    0m19,835s
user    0m18,830s
sys     0m1,004s

2017-07-31 13:10:02

这不是一个bash解决方案，但我发现顶级选择不能满足我的需求，例如，

sed 'NUMq;d' file

速度足够快，但挂了几个小时，没有告诉任何进展。我建议编译这个cpp程序并使用它来查找所需的行。您可以使用g++main.cpp编译它，其中main.cpp是包含以下内容的文件。我得到了一个，并执行了它/a.输出

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main() {
    string filename;
    cout << "Enter filename ";
    cin >> filename;

    int needed_row_number;
    cout << "Enter row number ";
    cin >> needed_row_number;

    int progress_line_count;
    cout << "Enter at which every number of rows to monitor progress ";
    cin >> progress_line_count;

    char ch;
    int row_counter = 1;
    fstream fin(filename, fstream::in);
    while (fin >> noskipws >> ch) {
        int ch_int = (int) ch;
        if (row_counter == needed_row_number) {
            cout << ch;
        }
        if (ch_int == 10) {
            if (row_counter == needed_row_number) {
                return 0;
            }
            row_counter++;
            if (row_counter % progress_line_count == 0) {
                cout << "Progress: line " << row_counter << endl;
            }
        }

    }
    return 0;
}

2022-07-21 10:25:11

从文件中获取第n行的Bash工具

推荐文章

最新文章

标签