如何从git差异读取输出?

git-diff的手册页相当长，解释了许多对于初学者来说似乎不需要的情况。例如:

git diff origin/master

在我的mac上:

info diff then select:输出格式->上下文->统一格式->详细统一:

或者在gnu上按照相同的路径到相同的章节:

File: diff.info, Node: Detailed Unified, Next: Example Unified, Up: Unified Format Detailed Description of Unified Format ...................................... The unified output format starts with a two-line header, which looks like this: --- FROM-FILE FROM-FILE-MODIFICATION-TIME +++ TO-FILE TO-FILE-MODIFICATION-TIME The time stamp looks like `2002-02-21 23:30:39.942229878 -0800' to indicate the date, time with fractional seconds, and time zone. You can change the header's content with the `--label=LABEL' option; see *Note Alternate Names::. Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks look like this: @@ FROM-FILE-RANGE TO-FILE-RANGE @@ LINE-FROM-EITHER-FILE LINE-FROM-EITHER-FILE... The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column: `+' A line was added here to the first file. `-' A line was removed here from the first file.

2010-03-27 14:09:19

默认的输出格式(如果你想查找更多信息，它最初来自一个叫做diff的程序)被称为“统一的diff”。它包含了4种不同类型的行:

上下文行，以一个空格开头，以+开头的显示已插入行的插入行，删除行，以-和开头元数据行描述了更高层次的东西，比如这谈论的是哪个文件，用于生成差异的选项是什么，文件是否更改了其权限，等等。

我建议您练习阅读文件的两个版本之间的差异，以便您确切地知道所更改的内容。这样，当你看到它的时候，你就会知道发生了什么。

2010-03-27 14:33:23

从你的问题中不清楚你觉得diff的哪一部分令人困惑:实际的diff，还是git打印的额外头信息。以防万一，这里有一个快速的标题概述。

第一行是diff——git a/path/to/file b/path/to/file显然它只是告诉你这部分diff是针对什么文件的。如果你设置布尔配置变量diff.mnemonic prefix, a和b将被更改为更有描述性的字母，如c和w(提交和工作树)。

接下来，是“模式行”——这些行描述了不涉及更改文件内容的任何更改。这包括新建/删除文件，重命名/复制文件，以及权限更改。

最后，有一行像index 789bd4..0 afb621 100644。您可能永远不会关心它，但这些6位十六进制数字是该文件的旧blob和新blob的缩写SHA1哈希值(blob是存储原始数据(如文件内容)的git对象)。当然，100644是文件的模式——最后三位数字显然是权限;前三个提供额外的文件元数据信息(SO post描述)。

在此之后，您将得到标准的统一diff输出(就像经典的diff -U一样)。它被分割成块——块是文件中包含更改及其上下文的部分。每个块之前都有一对-和+++行，表示有问题的文件，然后实际的差异(默认情况下)是-和+行两侧的三行上下文，显示删除/添加的行。

2010-03-27 14:57:51

让我们来看一个来自git历史的高级diff的例子(在git中提交1088261f。git存储库):

diff --git a/builtin-http-fetch.c b/http-fetch.c
similarity index 95%
rename from builtin-http-fetch.c
rename to http-fetch.c
index f3e63d7..e8f44ba 100644
--- a/builtin-http-fetch.c
+++ b/http-fetch.c
@@ -1,8 +1,9 @@
 #include "cache.h"
 #include "walker.h"
 
-int cmd_http_fetch(int argc, const char **argv, const char *prefix)
+int main(int argc, const char **argv)
 {
+       const char *prefix;
        struct walker *walker;
        int commits_on_stdin = 0;
        int commits;
@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv, const char *prefix)
        int get_verbosely = 0;
        int get_recover = 0;
 
+       prefix = setup_git_directory();
+
        git_config(git_default_config, NULL);
 
        while (arg < argc && argv[arg][0] == '-') {

让我们逐行分析这个补丁。

The first line diff --git a/builtin-http-fetch.c b/http-fetch.c is a "git diff" header in the form diff --git a/file1 b/file2. The a/ and b/ filenames are the same unless rename/copy is involved (like in our case). The --git is to mean that diff is in the "git" diff format. Next are one or more extended header lines. The first threesimilarity index 95% rename from builtin-http-fetch.c rename to http-fetch.ctell us that the file was renamed from builtin-http-fetch.c to http-fetch.c and that those two files are 95% identical (which was used to detect this rename). The last line in extended diff header, which is index f3e63d7..e8f44ba 100644 tell us about mode of given file (100644 means that it is ordinary file and not e.g. symlink, and that it doesn't have executable permission bit), and about shortened hash of preimage (the version of file before given change) and postimage (the version of file after change). This line is used by git am --3way to try to do a 3-way merge if patch cannot be applied itself. Next is two-line unified diff header--- a/builtin-http-fetch.c +++ b/http-fetch.cCompared to diff -U result it doesn't have from-file-modification-time nor to-file-modification-time after source (preimage) and destination (postimage) file names. If file was created the source is /dev/null; if file was deleted, the target is /dev/null.If you set diff.mnemonicPrefix configuration variable to true, in place of a/ and b/ prefixes in this two-line header you can have instead c/, i/, w/ and o/ as prefixes, respectively to what you compare; see git-config(1) Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks starts with line like@@ -1,8 +1,9 @@or@@ -18,6 +19,8 @@ int cmd_http_fetch(int argc, const char **argv, ... It is in the format @@ from-file-range to-file-range @@ [header]. The from-file-range is in the form -<start line>,<number of lines>, and to-file-range is +<start line>,<number of lines>. Both start-line and number-of-lines refer to position and length of hunk in preimage and postimage, respectively. If number-of-lines not shown it means that it is 1.

如果是C文件(如GNU diff中的-p选项)，则可选标头显示每次更改发生的C函数，如果有其他类型的文件，则显示等效的C函数。

接下来是文件不同之处的描述。两个文件共用的行都以空格字符开头。两个文件之间实际不同的行在左侧打印列中有以下指示符之一: '+'——在第一个文件中添加了一行。 '-'——从第一个文件中删除了一行。

例如，第一块

     #include "cache.h"
     #include "walker.h"
     
    -int cmd_http_fetch(int argc, const char **argv, const char *prefix)
    +int main(int argc, const char **argv)
     {
    +       const char *prefix;
            struct walker *walker;
            int commits_on_stdin = 0;
            int commits;

意味着cmd_http_fetch被main取代，并且const char *prefix;添加了一行。

换句话说，在更改之前，'builtin-http-fetch.c'文件的适当片段是这样的:

    #include "cache.h"
    #include "walker.h"
    
    int cmd_http_fetch(int argc, const char **argv, const char *prefix)
    {
           struct walker *walker;
           int commits_on_stdin = 0;
           int commits;

更改之后，现在'http-fetch.c'文件的片段看起来像这样:

    #include "cache.h"
    #include "walker.h"
     
    int main(int argc, const char **argv)
    {
           const char *prefix;
           struct walker *walker;
           int commits_on_stdin = 0;
           int commits;

在文件行末尾可能有\ No换行符(在示例diff中没有)。

正如Donal Fellows所说，最好在现实生活中的例子中练习阅读差异，在那里你知道你已经改变了什么。

引用:

git-diff(1) manpage, section“生成带有-p的补丁” (diff.info)统一节点详细信息，“统一格式详细说明”。

2010-03-27 16:28:28

这里有一个简单的例子。

diff --git a/file b/file 
index 10ff2df..84d4fa2 100644
--- a/file
+++ b/file
@@ -1,5 +1,5 @@
 line1
 line2
-this line will be deleted
 line4
 line5
+this line is added

下面是解释:

--git is not a command, this means it's a git version of diff (not unix) a/ b/ are directories, they are not real. it's just a convenience when we deal with the same file (in my case a/ is in index and b/ is in working directory) 10ff2df..84d4fa2 are blob IDs of these 2 files 100644 is the “mode bits,” indicating that this is a regular file (not executable and not a symbolic link) --- a/file +++ b/file minus signs shows lines in the a/ version but missing from the b/ version; and plus signs shows lines missing in a/ but present in b/ (in my case --- means deleted lines and +++ means added lines in b/ and this the file in the working directory) @@ -1,5 +1,5 @@ in order to understand this it's better to work with a big file; if you have two changes in different places you'll get two entries like @@ -1,5 +1,5 @@; suppose you have file line1 ... line100 and deleted line10 and add new line100 - you'll get:

-7,7 +7,6 @@ line6 line7 line8 line9 -删除第10行 line11 line12 line13 -98,3 +97,4 @@ line97 line98 line99 line100 这是新线100

2014-09-19 10:31:48

@@ -1,2 +3,4 @@部分的差异

我花了一些时间来理解这一部分，所以我创建了一个最小的示例。

格式基本与diff -u统一的diff相同。

例如:

diff -u <(seq 16) <(seq 16 | grep -Ev '^(2|3|14|15)$')

这里我们去掉了第2、3、14和15行。输出:

@@ -1,6 +1,4 @@
 1
-2
-3
 4
 5
 6
@@ -11,6 +9,4 @@
 11
 12
 13
-14
-15
 16

@@ - 1.6 + 1.4 @

-1,6表示第一个文件的这一部分从第1行开始，总共显示6行。因此它显示了第1行到第6行。 1 2 3. 4 5 6 -意思是“旧的”，我们通常把它称为diff -u old new。 +1,4意味着第二个文件的这一部分从第1行开始，总共显示4行。因此它显示了第1行到第4行。 +表示“新”。我们只有4行而不是6行，因为2行被删除了!新帅哥只是: 1 4 5 6

@@ -11,6 +9,4 @@对于第二个块是类似的:

在旧文件中，我们有6行，从旧文件的第11行开始: 11 12 13 14 15 16 在新文件中，我们有4行，从新文件的第9行开始: 11 12 13 16 注意，第11行是新文件的第9行，因为我们已经删除了前一个块上的2行:2和3。

大块头

根据你的git版本和配置，你也可以在@@行旁边得到一个代码行，例如func1() {in:

@@ -4,7 +4,6 @@ func1() {

这也可以通过plain diff的-p标志获得。

例如:旧文件:

func1() {
    1;
    2;
    3;
    4;
    5;
    6;
    7;
    8;
    9;
}

如果我们删除第6行，差异显示:

@@ -4,7 +4,6 @@ func1() {
     3;
     4;
     5;
-    6;
     7;
     8;
     9;

注意，这不是func1的正确行:它跳过了第1行和第2行。

这个很棒的特性通常会准确地告诉每个块属于哪个函数或类，这对于解释差异非常有用。

如何算法选择头确切地工作在讨论:在git diff块头摘录从哪里来?

2015-07-24 16:25:38

如何从git差异读取输出?

推荐文章

最新文章

标签