在Bash中提取子字符串

给定someletters_12345_moreleters形式的文件名。ext，我想提取5位数字，并将它们放入一个变量。

为了强调这一点，我有一个x个字符的文件名，然后是一个5位数字序列，两边都有一个下划线，然后是另一组x个字符。我想把这个5位数代入一个变量。

我对实现这一目标的多种不同方式非常感兴趣。

当前回答

给定test.txt文件包含"ABCDEFGHIJKLMNOPQRSTUVWXYZ"

cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST

2016-08-14 19:44:45

其他回答

没有任何子过程，您可以:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

一个非常小的变体也可以在ksh93中工作。

2009-01-09 16:13:38

以下是我的做法:

FN=someletters_12345_moreleters.ext
[[ ${FN} =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}

解释:

Bash-specific:

[[]]为条件表达式 =~表示条件为正则表达式如果前一个命令成功，&&将链接这些命令

正则表达式(RE): _([[:digit:]]{5})_

_是字面量，用于为被匹配的字符串划分/锚定匹配边界 ()创建捕获组 [[:digit:]]是一个字符类，我认为它不言自明 {5}表示前面的字符中的恰好五个，类(如本例中所示)或组必须匹配

In english, you can think of it behaving like this: the FN string is iterated character by character until we see an _ at which point the capture group is opened and we attempt to match five digits. If that matching is successful to this point, the capture group saves the five digits traversed. If the next character is an _, the condition is successful, the capture group is made available in BASH_REMATCH, and the next NUM= statement can execute. If any part of the matching fails, saved details are disposed of and character by character processing continues after the _. e.g. if FN where _1 _12 _123 _1234 _12345_, there would be four false starts before it found a match.

2009-01-12 19:43:20

如果有人想要更严格的信息，你也可以像这样在man bash中搜索

$ man bash [press return key]
/substring  [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]

结果:

${parameter:offset}
       ${parameter:offset:length}
              Substring Expansion.  Expands to  up  to  length  characters  of
              parameter  starting  at  the  character specified by offset.  If
              length is omitted, expands to the substring of parameter  start‐
              ing at the character specified by offset.  length and offset are
              arithmetic expressions (see ARITHMETIC  EVALUATION  below).   If
              offset  evaluates  to a number less than zero, the value is used
              as an offset from the end of the value of parameter.  Arithmetic
              expressions  starting  with  a - must be separated by whitespace
              from the preceding : to be distinguished from  the  Use  Default
              Values  expansion.   If  length  evaluates to a number less than
              zero, and parameter is not @ and not an indexed  or  associative
              array,  it is interpreted as an offset from the end of the value
              of parameter rather than a number of characters, and the  expan‐
              sion is the characters between the two offsets.  If parameter is
              @, the result is length positional parameters beginning at  off‐
              set.   If parameter is an indexed array name subscripted by @ or
              *, the result is the length members of the array beginning  with
              ${parameter[offset]}.   A  negative  offset is taken relative to
              one greater than the maximum index of the specified array.  Sub‐
              string  expansion applied to an associative array produces unde‐
              fined results.  Note that a negative offset  must  be  separated
              from  the  colon  by  at least one space to avoid being confused
              with the :- expansion.  Substring indexing is zero-based  unless
              the  positional  parameters are used, in which case the indexing
              starts at 1 by default.  If offset  is  0,  and  the  positional
              parameters are used, $0 is prefixed to the list.

2013-05-31 15:00:54

我的答案将对你想从字符串中得到什么有更多的控制。下面是如何从字符串中提取12345的代码

str="someletters_12345_moreleters.ext"
str=${str#*_}
str=${str%_more*}
echo $str

如果你想提取像abc这样的字符或像_或-这样的特殊字符，这样会更有效。例如:如果你的字符串是这样的，你想要someletters_之后和_moreleters之前的所有内容。ext:

str="someletters_123-45-24a&13b-1_moreleters.ext"

使用我的代码，您可以确切地说出您想要什么。解释:

#*它将删除前面的字符串，包括匹配的键。这里我们提到的键是_ 它将删除以下字符串，包括匹配的键。这里我们提到的键是_more*

自己做一些实验，你会发现这很有趣。

2016-07-29 07:41:26

您可以使用参数展开来做到这一点。

如果a为常数，则下面的参数展开执行子字符串提取:

b=${a:12:5}

12是偏移量(从零开始)，5是长度

如果数字周围的下划线是输入中唯一的下划线，您可以分两步分别去掉前缀和后缀:

tmp=${a#*_}   # remove prefix ending in "_"
b=${tmp%_*}   # remove suffix starting with "_"

如果有其他下划线，那么无论如何都可能是可行的，尽管比较棘手。如果有人知道如何在一个表达式中执行两个展开，我也想知道。

提出的两个解决方案都是纯bash，不涉及进程生成，因此非常快。

2009-01-09 15:52:35

在Bash中提取子字符串

推荐文章

最新文章

标签