使用Unix工具解析JSON

我试图解析从curl请求返回的JSON，就像这样:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

上面将JSON划分为多个字段，例如:

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

我如何打印一个特定的字段(由-v k=文本表示)?

当前回答

为了快速提取特定键的值，我个人喜欢使用“grep -o”，它只返回正则表达式的匹配项。例如，要从tweets中获取“text”字段，如下所示:

grep -Po '"text":.*?[^\\]",' tweets.json

这个正则表达式比你想象的更健壮;例如，它可以很好地处理包含逗号和转义引号的字符串。我想再做点工作，你就能做出一个保证能提取值的程序，如果它是原子的。(如果它有嵌套，那么正则表达式当然不能这样做。)

为了进一步清除(尽管保持字符串的原始转义)，您可以使用如下代码:| perl -pe 's/"text"://;/ / s / ^”;s /, / /美元”。(我这样做是为了分析。)

对于那些坚持认为你应该使用真正的JSON解析器的人——是的，这对于正确性是必不可少的，但是

To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting. grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

为了编写可维护的代码，我总是使用真正的解析库。我还没有尝试过jsawk，但如果它工作得很好，这将解决第1点。

最后一个更古怪的解决方案:我写了一个脚本，使用Python json并将你想要的键提取到制表符分隔的列中;然后我通过awk的包装器，允许对列进行命名访问。这里:json2tsv和tsvawk脚本。对于这个例子，它将是:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

这种方法没有解决第2点，比单一的Python脚本效率更低，而且有点脆弱:它强制将字符串值中的换行符和制表符规范化，以更好地处理awk的字段/记录分隔视图。但它确实让您停留在命令行上，比grep -o更正确。

2011-07-27 23:24:46

其他回答

基于这里的一些建议(特别是在评论中)建议使用Python，我很失望没有找到一个例子。

下面是一行程序，从JSON数据中获取单个值。它假设您将数据输送进来(从某个地方)，因此在脚本上下文中应该很有用。

echo '{"hostname":"test","domainname":"example.com"}' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hostname"]'

2011-12-06 13:05:53

还有一个非常简单但功能强大的JSON CLI处理工具fx。

例子

使用匿名函数:

echo '{"key": "value"}' | fx "x => x.key"

输出:

value

如果你不传递匿名函数参数→…，代码将自动转换为匿名函数。你可以通过这个关键字访问JSON:

$ echo '[1,2,3]' | fx "this.map(x => x * 2)"
[2, 4, 6]

或者也可以使用点语法:

echo '{"items": {"one": 1}}' | fx .items.one

输出:

你可以传递任意数量的匿名函数来减少JSON:

echo '{"items": ["one", "two"]}' | fx "this.items" "this[1]"

输出:

two

您可以使用扩展操作符更新现有JSON:

echo '{"count": 0}' | fx "{...this, count: 1}"

输出:

{"count": 1}

只是简单的JavaScript。没有必要学习新的语法。

fx的后期版本有一个互动模式!-

2018-01-25 17:51:04

你可以使用jshon:

curl 'http://twitter.com/users/username.json' | jshon -e text

2012-04-11 14:48:10

如果你正在寻找一个本地Mac解决方案来解析JSON(没有外部库等…)，那么这是为你。

此信息来自https://www.macblog.org/parse-json-command-line-mac/

简而言之，自从Mac OS Yosemite有一个运行苹果脚本的工具叫做osascript，但是如果你传递-l 'Javascript'标志，你可以运行Javascript !这就是所谓的使用JXA (JavaScript for Automation)。

下面是为我自己的项目读取JSON文件的示例。

DCMTK_JSON=$(curl -s https://formulae.brew.sh/api/bottle/dcmtk.json) # -s for silent mode
read -r -d '' JXA <<EOF
function run() {
  var json = JSON.parse(\`$DCMTK_JSON\`);
  return json.bottles.$2.url;
}
EOF
DOWNLOAD_URL=$( osascript -l 'JavaScript' <<< "${JXA}" )
echo "DOWNLOAD_URL=${DOWNLOAD_URL}"

这里所发生的是我们将函数的输出存储到变量JXA中。然后我们可以使用JSON.parse()简单地运行javascript来解析JSON内容。然后简单地将包含脚本的JXA变量传递给osascript工具，以便它可以运行javascript。在我的例子中，如果测试的话，$2指的是arm64_monterey。javascript立即运行的原因是特殊的run()函数，JXA将查找该函数并在完成时返回其输出。

注意EOF(文件的结尾)用于处理多行文本输入，并且结束的EOF前面不能有任何空格。

您可以通过简单地打开终端并键入下面的命令来测试这是否适合您

osascript -l 'JavaScript' -e 'var app = Application.currentApplication(); app.includeStandardAdditions = true; app.displayDialog("Hello from JavaScript!");

这应该会弹出一个从javascript说hello的窗口

2022-05-18 17:42:19

更新(2020)

我使用外部工具(例如Python)时遇到的最大问题是，你必须处理包管理器和安装它们的依赖关系。

然而，现在我们有了jq作为一个独立的静态工具，很容易通过GitHub发布和Webi (webinstall.dev/jq)跨平台安装，我建议:

Mac、Linux:

curl -sS https://webi.sh/jq | bash

Windows 10:

curl.exe -A MS https://webi.ms/jq | powershell

小抄:https://webinstall.dev/jq

原(2011)

TickTick是一个用bash编写的JSON解析器(不到250行代码)。

以下是作者在他的文章《想象一个Bash支持JSON的世界》中的片段:

#!/bin/bash
. ticktick.sh

``
  people = {
    "Writers": [
      "Rod Serling",
      "Charles Beaumont",
      "Richard Matheson"
    ],
    "Cast": {
      "Rod Serling": { "Episodes": 156 },
      "Martin Landau": { "Episodes": 2 },
      "William Shatner": { "Episodes": 2 }
    }
  }
``

function printDirectors() {
  echo "  The ``people.Directors.length()`` Directors are:"

  for director in ``people.Directors.items()``; do
    printf "    - %s\n" ${!director}
  done
}

`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors

newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors

echo "Shifted: "``people.Directors.shift()``
printDirectors

echo "Popped: "``people.Directors.pop()``
printDirectors

2011-12-10 03:49:32

使用Unix工具解析JSON

推荐文章

最新文章

标签