我目前通过一堆不好记的AWK、sed、Bash和一小部分Perl来做我的文本文件操作。
我已经看到提到的一些地方,python很适合做这种事情。我如何使用Python来代替shell脚本,AWK, sed和朋友?
我目前通过一堆不好记的AWK、sed、Bash和一小部分Perl来做我的文本文件操作。
我已经看到提到的一些地方,python很适合做这种事情。我如何使用Python来代替shell脚本,AWK, sed和朋友?
当前回答
在研究这个主题时,我发现了这个概念验证代码(通过http://jlebar.com/2010/2/1/Replacing_Bash.html上的评论),它让你“使用简洁的语法在Python中编写类似shell的管道,并在有意义的地方利用现有的系统工具”:
for line in sh("cat /tmp/junk2") | cut(d=',',f=1) | 'sort' | uniq:
sys.stdout.write(line)
其他回答
是的,当然。
看看这些库,它们可以帮助你不再编写shell脚本(Plumbum的座右铭)。
铅 军士 上海
另外,如果你想用基于Python的东西替换awk, sed和grep,那么我推荐pyp -
“The Pyed Piper”,或pyp,是一个linux命令行文本操作 工具类似于awk或sed,但使用标准的python字符串和 列表方法以及自定义函数进化为快速生成 在紧张的生产环境中产生的结果。
以下是我的一些经验之谈:
外壳:
Shell可以很容易地生成只读代码。把它写下来,当你回头看的时候,你永远也不会知道你又做了什么。这很容易做到。 shell可以用管道在一行中做大量的文本处理、分割等。 当涉及到集成不同编程语言的程序调用时,它是最好的粘合语言。
python:
如果你想要Windows的可移植性,请使用python。 当您必须处理的不仅仅是文本,比如数字的集合时,Python可能会更好。为此,我推荐python。
我通常选择bash来处理大多数事情,但当我有一些必须跨越窗口边界的东西时,我就使用python。
任何shell都有几组特性。
The Essential Linux/Unix commands. All of these are available through the subprocess library. This isn't always the best first choice for doing all external commands. Look also at shutil for some commands that are separate Linux commands, but you could probably implement directly in your Python scripts. Another huge batch of Linux commands are in the os library; you can do these more simply in Python. And -- bonus! -- more quickly. Each separate Linux command in the shell (with a few exceptions) forks a subprocess. By using Python shutil and os modules, you don't fork a subprocess. The shell environment features. This includes stuff that sets a command's environment (current directory and environment variables and what-not). You can easily manage this from Python directly. The shell programming features. This is all the process status code checking, the various logic commands (if, while, for, etc.) the test command and all of it's relatives. The function definition stuff. This is all much, much easier in Python. This is one of the huge victories in getting rid of bash and doing it in Python. Interaction features. This includes command history and what-not. You don't need this for writing shell scripts. This is only for human interaction, and not for script-writing. The shell file management features. This includes redirection and pipelines. This is trickier. Much of this can be done with subprocess. But some things that are easy in the shell are unpleasant in Python. Specifically stuff like (a | b; c ) | something >result. This runs two processes in parallel (with output of a as input to b), followed by a third process. The output from that sequence is run in parallel with something and the output is collected into a file named result. That's just complex to express in any other language.
特定的程序(awk、sed、grep等)通常可以被重写为Python模块。不要走极端。替换您需要的内容并改进“grep”模块。不要一开始就编写一个替换“grep”的Python模块。
最好的事情是你可以一步一步来做。
用Python替换AWK和PERL。不要管其他的事情。 看看用Python替换GREP。这可能有点复杂,但是您的GREP版本可以根据您的处理需求进行定制。 看看用使用os.walk的Python循环替换FIND。这是一个巨大的胜利,因为您不需要生成那么多的进程。 看看用Python脚本替换常见的shell逻辑(循环、决策等)。
在ShellPy库中可以使用python而不是bash。
下面是一个从Github下载Python用户头像的例子:
import json
import os
import tempfile
# get the api answer with curl
answer = `curl https://api.github.com/users/python
# syntactic sugar for checking returncode of executed process for zero
if answer:
answer_json = json.loads(answer.stdout)
avatar_url = answer_json['avatar_url']
destination = os.path.join(tempfile.gettempdir(), 'python.png')
# execute curl once again, this time to get the image
result = `curl {avatar_url} > {destination}
if result:
# if there were no problems show the file
p`ls -l {destination}
else:
print('Failed to download avatar')
print('Avatar downloaded')
else:
print('Failed to access github api')
如您所见,所有在grave重音(')符号内的表达式都在shell中执行。在Python代码中,您可以捕获此执行的结果并对其执行操作。例如:
log = `git log --pretty=oneline --grep='Create'
这一行首先在shell中执行git log——pretty=oneline——grep='Create',然后将结果赋值给log变量。结果具有以下属性:
从已执行进程的Stdout中Stdout整个文本
Stderr从已执行进程的Stderr得到的整个文本
Returncode执行的返回码
这是该库的总体概述,更详细的描述和示例可以在这里找到。
Pythonpy是一个工具,可以方便地访问awk和sed的许多特性,但使用python语法:
$ echo me2 | py -x 're.sub("me", "you", x)'
you2