Web lists-archives.com

How to produce a loose ref+size explosion via pruning + git-gc




I'll probably submit docs for this eventually, but the docs in my
--prune-tags series were already hard enough to review. Try running this

    (
        rm -rf /tmp/git &&
        git clone https://github.com/git/git /tmp/git &&
        cd /tmp/git >/dev/null &&
        du -sh .git &&
        git rev-list --all origin/master.. | wc -l &&
        for clone in gitster peff avar chriscool mhagger pclouds Microsoft
        do
            git remote add $clone https://github.com/$clone/git &&
            git fetch -q $clone
        done &&
        git gc &&
        du -sh .git &&
        git rev-list --all origin/master.. | wc -l &&
        git fetch -q origin --prune 'refs/tags/*:refs/tags/*' &&
        for remote in $(git remote | grep -v origin)
        do
            git remote rm $remote
        done &&
        git gc &&
        du -sh .git &&
        git rev-list --all origin/master.. | wc -l
    )

The output is:

    108M    .git
    2222
    160M    .git
    62220
    1.9G    .git
    2222

I.e. a fresh clone of git.git is 108MB, add a few more repos that have
diverged quite a bit in its network ad it's 160MB repacked.

Now remove those remotes and "git gc" and it's 1.9GB, even though it's
divergent by the same 2222 commits from master as the 108MB, but after
running:

    git prune --expire=now

It becomes ~108MB again.

Now this is all expected behavior, we've made a bunch of objects
unreferenced, so they all get exploded into loose objects, which takes a
lot of space.

It's an interesting caveat when setting fetch.prune=true on checkouts
that didn't previously have it and might have lots of brances to be
pruned.

For reasons I won't go into I'd had that disabled for a while here at
work, and after re-enabling it we had some repos whose .git is usually
2.5G explode to 30G once git-gc ran.

The workaround is to set gc.pruneExpire low enough that when the gc hits
all those objects get deleted, I set it to 1 day (from the default of 2
weeks).

But it doesn't help with repos that have already run git-gc and exploded
in size, much to the confusion of users on those systems, those need a
manual git-prune.

Potential solutions to this have been discussed ad-nauseam here on
list. Let's not go into that (unless someone feels like it).

I mainly wanted to send this for later reference, and have some
searchable record in case someone's confused when they turn on prune and
their repo increases to 10x the previous size.