Web lists-archives.com

Re: git svn clone/fetch hits issues with gc --auto

On Wed, Oct 10 2018, Martin Langhoff wrote:

> Looking around, Jonathan Tan's "[PATCH] gc: do not warn about too many
> loose objects" makes sense to me.
> - remove unactionable warning
> - as the warning is gone, no gc.log is produced
> - subsequent gc runs don't exit due to gc.log
> My very humble +1 on that.
> As for downsides... if we have truly tons of _recent_ loose objects,
> it'll ... take disk space? I'm fine with that.

As Jeff's
and my https://public-inbox.org/git/878t69dgvx.fsf@xxxxxxxxxxxxxxxxxxx/
note it's a bit more complex than that.


 - The warning is actionable, you can decide to up your expiration

 - We use this warning as a proxy for "let's not run for a day",
   otherwise we'll just grind on gc --auto trying to consolidate
   possibly many hundreds of K of loose objects only to find none of
   them can be pruned because the run into the expiry policy. With the
   warning we retry that once per day, which sucks less.

 - This conflation of the user-visible warning and the policy is an
   emergent effect of how the different gc pieces interact, which as I
   note in the linked thread(s) sucks.

   But we can't just yank one piece away (as Jonathan's patch does)
   without throwing the baby out with the bathwater.

   It will mean that e.g. if you have 10k loose objects in your git.git,
   and created them just now, that every time you run anything that runs
   "gc --auto" we'll fork to the background, peg a core at 100% CPU for
   2-3 minutes or whatever it is, only do get nowhere and do the same
   thing again in ~3 minutes when you run your next command.

 - I think you may be underestimating some of the cases where this ends
   up taking a huge amount of disk space (and now we'll issue at least
   *some*) warning. See my
   where a repo's .git went from 2.5G to 30G due to being stuck in this

> For more aggressive gc options, thoughts:
>  - Do we always consider git gc --prune=now "safe" in a "won't delete
> stuff the user is likely to want" sense? For example -- are the
> references from reflogs enough safety?

The --prune=now command is not generally safe for the reasons noted in
the "NOTES" section in "git help gc".

>  - Even if we don't, for some commands it should be safe to run git gc
> --prune=now at the end of the process, for example an import that
> generates a new git repo (git svn clone).

Yeah I don't see a problem with that, I didn't know about this
interesting use-case, i.e. that "git svn clone" will create a lot of
loose objects.

As seen in my
https://public-inbox.org/git/87tvm3go42.fsf@xxxxxxxxxxxxxxxxxxx/ I'm
working on making "gc --auto" run at the end of clone for unrelated
reasons, i.e. so we generate the commit-graph, seems like "git svn
clone" could do something similar.

So it's creating a lot of garbage during its cloning process that can
just be immediately thrown away? What is it doing? Using the object
store as a scratch pad for its own temporary state?

> m
> On Tue, Oct 9, 2018 at 10:49 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Forwarding to Jonathan, as I think this is an interesting supporting
>> vote for the topic that we were stuck on.
>> Eric Wong <e@xxxxxxxxx> writes:
>> > Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
>> >> Hi folks,
>> >>
>> >> Long time no see! Importing a 3GB (~25K revs, tons of files) SVN repo
>> >> I hit the gc error:
>> >>
>> >> warning: There are too many unreachable loose objects; run 'git prune'
>> >> to remove them.
>> >> gc --auto: command returned error: 255
>> >
>> > GC can be annoying when that happens... For git-svn, perhaps
>> > this can be appropriate to at least allow the import to continue:
>> >
>> > diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>> > index 76b2965905..9b0caa3d47 100644
>> > --- a/perl/Git/SVN.pm
>> > +++ b/perl/Git/SVN.pm
>> > @@ -999,7 +999,7 @@ sub restore_commit_header_env {
>> >  }
>> >
>> >  sub gc {
>> > -     command_noisy('gc', '--auto');
>> > +     eval { command_noisy('gc', '--auto') };
>> >  };
>> >
>> >  sub do_git_commit {
>> >
>> >
>> > But yeah, somebody else who works on git regularly could
>> > probably stop repack from writing thousands of loose
>> > objects (and instead write a self-contained pack with
>> > those objects, instead).  I haven't followed git closely
>> > lately, myself.