Web lists-archives.com

Re: We should add a "git gc --auto" after "git clone" due to commit graph




On Wed, Oct 03 2018, Jeff King wrote:

> On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote:
>
>> I share these concerns in a slightly more abstract way, as
>> I would bucket the actions into two separate bins:
>>
>> One bin that throws away information.
>> this would include removing expired reflog entries (which
>> I do not think are garbage, or collection thereof), but their
>> usefulness is questionable.
>>
>> The other bin would be actions that optimize but
>> do not throw away any information, repacking (without
>> dropping files) would be part of it, or the new
>> "write additional files".
>>
>> Maybe we can move all actions of the second bin into a new
>> "git optimize" command, and git gc would do first the "throw away
>> things" and then the optimize action, whereas clone would only
>> go for the second optimizing part?
>
> One problem with that world-view is that some of the operations do
> _both_, for efficiency. E.g., repacking will drop unreachable objects in
> too-old packs. We could actually be more aggressive in combining things
> here. For instance, a full object graph walk in linux.git takes 30-60
> seconds, depending on your CPU. But we do it at least twice during a gc:
> once to repack, and then again to determine reachability for pruning.
>
> If you generate bitmaps during the repack step, you can use them during
> the prune step. But by itself, the cost of generating the bitmaps
> generally outweighs the extra walk. So it's not worth generating them
> _just_ for this (but is an obvious optimization for a server which would
> be generating them anyway).

I don't mean to fan the flames of this obviously controversial "git gc
does optimization" topic (which I didn't suspect there would be a debate
about...), but a related thing I was wondering about the other day is
whether we could have a gc.fsck option, and in the background do fsck
while we were at it, and report this back via some facility like
gc.log[1].

That would also fall into this category of more work we could do while
we're doing a full walk anyway, but as with what you're suggesting would
require some refactoring.

1. Well, one that doesn't suck, see
   https://public-inbox.org/git/87inc89j38.fsf@xxxxxxxxxxxxxxxxxxx/ /
   https://public-inbox.org/git/87d0vmck55.fsf@xxxxxxxxxxxxxxxxxxx/ etc.