如何从Linux shell脚本解析YAML文件?

我希望提供一个结构化的配置文件，它对于非技术用户来说尽可能容易编辑(不幸的是它必须是一个文件)，所以我想使用YAML。然而，我找不到任何方法从Unix shell脚本解析这个。

当前回答

以下是Stefan Farestam回答的扩展版本:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|,$s\]$s\$|]|" \
        -e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1  - \4|;t1" \
        -e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1  - \3|;p" $1 | \
   sed -ne "s|,$s}$s\$|}|" \
        -e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1  \3: \4|;t1" \
        -e    "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1  \2|;p" | \
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
      if(length($2)== 0){  vname[indent]= ++idx[indent] };
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
      }
   }'
}

该版本支持字典和列表的-符号和短符号。以下输入:

global:
  input:
    - "main.c"
    - "main.h"
  flags: [ "-O3", "-fpic" ]
  sample_input:
    -  { property1: value, property2: "value2" }
    -  { property1: "value3", property2: 'value 4' }

产生如下输出:

global_input_1="main.c"
global_input_2="main.h"
global_flags_1="-O3"
global_flags_2="-fpic"
global_sample_input_1_property1="value"
global_sample_input_1_property2="value2"
global_sample_input_2_property1="value3"
global_sample_input_2_property2="value 4"

as you can see the - items automatically get numbered in order to obtain different variable names for each item. In bash there are no multidimensional arrays, so this is one way to work around. Multiple levels are supported. To work around the problem with trailing white spaces mentioned by @briceburg one should enclose the values in single or double quotes. However, there are still some limitations: Expansion of the dictionaries and lists can produce wrong results when values contain commas. Also, more complex structures like values spanning multiple lines (like ssh-keys) are not (yet) supported.

A few words about the code: The first sed command expands the short form of dictionaries { key: value, ...} to regular and converts them to more simple yaml style. The second sed call does the same for the short notation of lists and converts [ entry, ... ] to an itemized list with the - notation. The third sed call is the original one that handled normal dictionaries, now with the addition to handle lists with - and indentations. The awk part introduces an index for each indentation level and increases it when the variable name is empty (i.e. when processing a list). The current value of the counters are used instead of the empty vname. When going up one level, the counters are zeroed.

编辑:我已经为此创建了一个github存储库。

2018-08-10 15:24:05

其他回答

我刚刚写了一个解析器，我称之为Yay!(Yaml不是Yamlesque!)它解析Yamlesque, Yaml的一个小子集。因此，如果您正在为Bash寻找一个100%兼容的YAML解析器，那么这不是它。但是，为了引用OP，如果您想要一个结构化的配置文件，使非技术用户能够尽可能容易地编辑它，并且是类似yaml的，那么您可能会对它感兴趣。

它受到前面答案的启发，但编写了关联数组(是的，它需要Bash 4.x)而不是基本变量。它以一种允许在不事先了解键的情况下解析数据的方式进行操作，从而可以编写数据驱动的代码。

除了键/值数组元素外，每个数组都有一个包含键名列表的键数组、一个包含子数组名称的子数组和一个引用其父数组的父键。

这是Yamlesque的一个例子:

root_key1: this is value one
root_key2: "this is value two"

drink:
  state: liquid
  coffee:
    best_served: hot
    colour: brown
  orange_juice:
    best_served: cold
    colour: orange

food:
  state: solid
  apple_pie:
    best_served: warm

root_key_3: this is value three

下面是一个如何使用它的例子:

#!/bin/bash
# An example showing how to use Yay

. /usr/lib/yay

# helper to get array value at key
value() { eval echo \${$1[$2]}; }

# print a data collection
print_collection() {
  for k in $(value $1 keys)
  do
    echo "$2$k = $(value $1 $k)"
  done

  for c in $(value $1 children)
  do
    echo -e "$2$c\n$2{"
    print_collection $c "  $2"
    echo "$2}"
  done
}

yay example
print_collection example

输出:

root_key1 = this is value one
root_key2 = this is value two
root_key_3 = this is value three
example_drink
{
  state = liquid
  example_coffee
  {
    best_served = hot
    colour = brown
  }
  example_orange_juice
  {
    best_served = cold
    colour = orange
  }
}
example_food
{
  state = solid
  example_apple_pie
  {
    best_served = warm
  }
}

下面是解析器:

yay_parse() {

   # find input file
   for f in "$1" "$1.yay" "$1.yml"
   do
     [[ -f "$f" ]] && input="$f" && break
   done
   [[ -z "$input" ]] && exit 1

   # use given dataset prefix or imply from file name
   [[ -n "$2" ]] && local prefix="$2" || {
     local prefix=$(basename "$input"); prefix=${prefix%.*}
   }

   echo "declare -g -A $prefix;"

   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
          -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
   awk -F$fs '{
      indent       = length($1)/2;
      key          = $2;
      value        = $3;

      # No prefix or parent for the top level (indent zero)
      root_prefix  = "'$prefix'_";
      if (indent ==0 ) {
        prefix = "";          parent_key = "'$prefix'";
      } else {
        prefix = root_prefix; parent_key = keys[indent-1];
      }

      keys[indent] = key;

      # remove keys left behind if prior row was indented more than this row
      for (i in keys) {if (i > indent) {delete keys[i]}}

      if (length(value) > 0) {
         # value
         printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
         printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);
      } else {
         # collection
         printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
         printf("declare -g -A %s%s;\n", root_prefix, key);
         printf("%s%s[parent]=\"%s%s\";\n", root_prefix, key, prefix, parent_key);
      }
   }'
}

# helper to load yay data file
yay() { eval $(yay_parse "$@"); }

在链接的源文件中有一些文档，下面是对代码功能的简短解释。

yay_parse函数首先定位输入文件或退出，退出状态为1。接下来，它确定数据集前缀，要么显式指定，要么从文件名派生。

它将有效的bash命令写入其标准输出，如果执行该输出，则定义表示输入数据文件内容的数组。第一个定义了顶级数组:

echo "declare -g -A $prefix;"

注意，数组声明是关联的(-A)，这是Bash版本4的一个特性。声明也是全局的(-g)，所以它们可以在函数中执行，但像yay helper一样可用于全局作用域:

yay() { eval $(yay_parse "$@"); }

最初使用sed处理输入数据。它删除不匹配Yamlesque格式规范的行，然后用ASCII文件分隔符分隔有效的Yamlesque字段，并删除值字段周围的任何双引号。

 local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
 sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |

这两种表达是相似的;它们的不同之处在于第一个选择了带引号的值，而第二个选择了不带引号的值。

使用文件分隔符(28/十六进制12/八进制034)是因为，作为一个不可打印字符，它不太可能出现在输入数据中。

结果通过管道传输到awk中，每次处理一行输入。它使用FS字符将每个字段分配给一个变量:

indent       = length($1)/2;
key          = $2;
value        = $3;

所有行都有缩进(可能为零)和键，但它们并不都有值。它为包含前导空白的第一个字段的长度除以2的行计算缩进级别。没有缩进的顶级项位于缩进级别0。

接下来，它计算出为当前项使用什么前缀。这是添加到键名中以创建数组名的内容。顶级数组有一个root_prefix，它被定义为数据集名称和一个下划线:

root_prefix  = "'$prefix'_";
if (indent ==0 ) {
  prefix = "";          parent_key = "'$prefix'";
} else {
  prefix = root_prefix; parent_key = keys[indent-1];
}

parent_key是位于当前行缩进级别之上的缩进级别的键，表示当前行所属的集合。集合的键/值对将存储在一个数组中，其名称定义为前缀和parent_key的连接。

对于顶层(缩进级别0)，数据集前缀被用作父键，因此它没有前缀(它被设置为“”)。所有其他数组都以根前缀作为前缀。

接下来，将当前键插入到包含键的(awk-internal)数组中。该数组在整个awk会话中持续存在，因此包含先前行插入的键。键以其缩进作为数组索引插入数组。

keys[indent] = key;

因为这个数组包含前几行的键，所以任何缩进级别大于当前行缩进级别的键都将被移除:

 for (i in keys) {if (i > indent) {delete keys[i]}}

这将留下包含从根缩进级别0到当前行的键链的keys数组。它删除前一行缩进比当前行更深时保留的过时键。

最后一部分输出bash命令:不带值的输入行开始一个新的缩进级别(在YAML中是一个集合)，带值的输入行向当前集合添加一个键。

集合的名称是当前行的前缀和parent_key的组合。

当一个键有一个值时，具有该值的键会被赋给当前集合，如下所示:

printf("%s%s[%s]=\"%s\";\n", prefix, parent_key , key, value);
printf("%s%s[keys]+=\" %s\";\n", prefix, parent_key , key);

第一个语句输出将值赋给一个以键命名的关联数组元素的命令，第二个语句输出将键添加到集合的空格分隔键列表的命令:

<current_collection>[<key>]="<value>";
<current_collection>[keys]+=" <key>";

当一个键没有值时，一个新的集合像这样开始:

printf("%s%s[children]+=\" %s%s\";\n", prefix, parent_key , root_prefix, key);
printf("declare -g -A %s%s;\n", root_prefix, key);

第一个语句输出将新集合添加到当前集合的空格分隔子列表的命令，第二个语句输出为新集合声明一个新的关联数组的命令:

<current_collection>[children]+=" <new_collection>"
declare -g -A <new_collection>;

yay_parse的所有输出都可以通过bash eval或源内置命令解析为bash命令。

2015-07-29 15:48:04

可以将一个小脚本传递给一些解释器，比如Python。使用Ruby和它的YAML库的简单方法如下:

$ RUBY_SCRIPT="data = YAML::load(STDIN.read); puts data['a']; puts data['b']"
$ echo -e '---\na: 1234\nb: 4321' | ruby -ryaml -e "$RUBY_SCRIPT"
1234
4321

，其中data是来自yaml的值的散列(或数组)。

作为奖励，它可以很好地解析杰基尔的正面问题。

ruby -ryaml -e "puts YAML::load(open(ARGV.first).read)['tags']" example.md

2012-02-11 20:02:49

下面是一个bash-only解析器，利用sed和awk来解析简单的yaml文件:

function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}

它可以理解以下文件:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   file: "yes"

在解析时使用:

parse_yaml sample.yml

将输出:

global_debug="yes"
global_verbose="no"
global_debugging_detailed="no"
global_debugging_header="debugging started"
output_file="yes"

它也理解由ruby生成的yaml文件，其中可能包含ruby符号，例如:

---
:global:
  :debug: 'yes'
  :verbose: 'no'
  :debugging:
    :detailed: 'no'
    :header: debugging started
  :output: 'yes'

并将输出与前一个示例相同的结果。

脚本中的典型用法是:

eval $(parse_yaml sample.yml)

Parse_yaml接受一个前缀参数，这样导入的所有设置都有一个公共前缀(这将减少名称空间冲突的风险)。

parse_yaml sample.yml "CONF_"

收益率:

CONF_global_debug="yes"
CONF_global_verbose="no"
CONF_global_debugging_detailed="no"
CONF_global_debugging_header="debugging started"
CONF_output_file="yes"

注意，之前文件中的设置可以被后面的设置引用:

## global definitions
global:
  debug: yes
  verbose: no
  debugging:
    detailed: no
    header: "debugging started"

## output
output:
   debug: $global_debug

另一个很好的用法是先解析默认文件，然后解析用户设置，这是可行的，因为后一个设置会覆盖第一个设置:

eval $(parse_yaml defaults.yml)
eval $(parse_yaml project.yml)

2014-01-17 15:03:49

把我的答案从如何在bash中将json响应转换为yaml，因为这似乎是关于从命令行处理yaml文本解析的权威帖子。

我想添加一些关于yq YAML实现的细节。由于这个YAML解析器有两种实现，名称都是yq，如果不查看实现的DSL，就很难区分使用的是哪一种。有两个可用的实现

kislyuk/yq——更常被提及的版本，它是jq的包装器，用Python编写，使用PyYAML库进行YAML解析 mikefarah/yq -一个Go实现，使用Go -yaml v3解析器，有自己的动态DSL。

几乎所有主要发行版都可以通过标准安装包管理器进行安装

kislyuk/yq -安装说明 mikefarah/yq -安装说明

这两个版本都有一些优点和缺点，但有一些有效的点需要强调(从他们的回购指令中采用)

kislyuk - yq

Since the DSL is the adopted completely from jq, for users familiar with the latter, the parsing and manipulation becomes quite straightforward Supports mode to preserve YAML tags and styles, but loses comments during the conversion. Since jq doesn't preserve comments, during the round-trip conversion, the comments are lost. As part of the package, XML support is built in. An executable, xq, which transcodes XML to JSON using xmltodict and pipes it to jq, on which you can apply the same DSL to perform CRUD operations on the objects and round-trip the output back to XML. Supports in-place edit mode with -i flag (similar to sed -i)

迈克法拉/YQ

Prone to frequent changes in DSL, migration from 2.x - 3.x Rich support for anchors, styles and tags. But lookout for bugs once in a while A relatively simple Path expression syntax to navigate and match yaml nodes Supports YAML->JSON, JSON->YAML formatting and pretty printing YAML (with comments) Supports in-place edit mode with -i flag (similar to sed -i) Supports coloring the output YAML with -C flag (not applicable for JSON output) and indentation of the sub elements (default at 2 spaces) Supports Shell completion for most shells - Bash, zsh (because of powerful support from spf13/cobra used to generate CLI flags)

我对以下两个版本的YAML的看法(在其他答案中也有引用)

root_key1: this is value one
root_key2: "this is value two"

drink:
  state: liquid
  coffee:
    best_served: hot
    colour: brown
  orange_juice:
    best_served: cold
    colour: orange

food:
  state: solid
  apple_pie:
    best_served: warm

root_key_3: this is value three

对这两个实现执行的各种操作(一些常用操作)

修改根节点值—修改“root_key2”的值修改数组内容，增加值-为coffee添加属性修改数组内容，删除value - Delete属性从orange_juice 打印带有路径的键/值对—用于food下的所有项目

使用kislyuk / yq

Yq -y '。Root_key2 |= "this is a new value 你，你，喝。咖啡+={时间:"always"}' yaml Yq -y 'del(.drink.orange_juice.colour)' yaml yq - r的.food |路径(标量)美元p | (($ p |加入(“。”)),(getpath ($ p) | tojson)] | @tsv的yaml

这很简单。你所需要做的就是用-y标志将jq JSON输出转码回YAML。

用mikefarah - yq

Yq w yaml root_key2 "这是一个新值" Yq w yaml喝。咖啡。时间“总是” Yq d yaml饮料。橙汁。颜色 yq r yaml——printMode pv "food.**"

截至2020年12月21日，yq v4是测试版，支持许多强大的路径表达式，并支持类似于使用jq的DSL。阅读过渡说明-从V3升级

2020-11-08 19:58:47

现在做这件事的一个快速方法(以前的方法对我没用):

sudo wget https://github.com/mikefarah/yq/releases/download/v4.4.1/yq_linux_amd64 -O /usr/bin/yq &&\
sudo chmod +x /usr/bin/yq

示例asd.yaml:

a_list:
  - key1: value1
    key2: value2
    key3: value3

解析:根

user@vm:~$ yq e '.' asd.yaml                                                                                                         
a_list:
  - key1: value1
    key2: value2
    key3: value3

解析key3:

user@vm:~$ yq e '.a_list[0].key3' asd.yaml                                                                                             
value3

2021-01-21 08:10:48

如何从Linux shell脚本解析YAML文件?

推荐文章

最新文章

标签