如何在git历史中找到/识别大提交?

我有一个300mb的git回购。我目前签出的文件的总大小是2 MB，其余的git回购的总大小是298 MB。这基本上是一个只有代码的回购，不应该超过几MB。

我怀疑有人不小心提交了一些大文件(视频、图像等)，然后删除了它们……但不是从git，所以历史仍然包含无用的大文件。如何在git历史中找到大文件?有400多个提交，所以一个接一个的提交是不实际的。

注意:我的问题不是关于如何删除文件，而是如何在第一时间找到它。

当前回答

感受一下git历史中最后一次提交的“diff大小”

git log --stat

这将以行为单位显示差异大小:添加的行，删除的行

2022-10-11 18:04:48

其他回答

感受一下git历史中最后一次提交的“diff大小”

git log --stat

这将以行为单位显示差异大小:添加的行，删除的行

2022-10-11 18:04:48

如何在git历史记录中追踪大文件?

从分析、确认和选择根本原因开始。使用git-repo-analysis来提供帮助。

你也可以在BFG Repo-Cleaner生成的详细报告中找到一些价值，它可以通过克隆到数字海洋液滴，使用10MiB/s的网络吞吐量快速运行。

2017-05-26 11:38:06

你应该使用BFG Repo-Cleaner。

根据该网站:

BFG是一个更简单、更快的git-filter-branch的替代方案清除Git存储库历史中的坏数据: 删除疯狂的大文件删除密码，凭证和其他私人数据

减少存储库大小的经典过程是:

git clone --mirror git://example.com/some-big-repo.git
java -jar bfg.jar --strip-biggest-blobs 500 some-big-repo.git
cd some-big-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push

2014-03-11 18:45:18

我发现这个脚本在过去在git存储库中查找大型(和不明显的)对象非常有用:

http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/

#!/bin/bash
#set -x 
 
# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs
 
# set the internal field separator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';
 
# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`
 
echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."
 
output="size,pack,SHA,location"
allObjects=`git rev-list --all --objects`
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`echo "${allObjects}" | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done
 
echo -e $output | column -t -s ', '

这将给你blob的对象名称(SHA1sum)，然后你可以使用这样的脚本:

哪个提交有这个斑点?

…来查找指向这些blob的提交。

2012-05-16 16:01:20

如果你在Windows上，下面是一个PowerShell脚本，它将打印存储库中最大的10个文件:

$revision_objects = git rev-list --objects --all;
$files = $revision_objects.Split() | Where-Object {$_.Length -gt 0 -and $(Test-Path -Path $_ -PathType Leaf) };
$files | Get-Item -Force | select fullname, length | sort -Descending -Property Length | select -First 10

2016-05-14 23:19:04

如何在git历史中找到/识别大提交?

推荐文章

最新文章

标签