Web lists-archives.com

Re: We should add a "git gc --auto" after "git clone" due to commit graph




On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 04:22:12PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Wed, Oct 03 2018, SZEDER Gábor wrote:
>>
>> > On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Wed, Oct 03 2018, SZEDER Gábor wrote:
>> >>
>> >> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> >> Don't have time to patch this now, but thought I'd send a note / RFC
>> >> >> about this.
>> >> >>
>> >> >> Now that we have the commit graph it's nice to be able to set
>> >> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>> >> >> /etc/gitconfig to apply them to all repos.
>> >> >>
>> >> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>> >> >> until whenever my first "gc" kicks in, which may be quite some time if
>> >> >> I'm just using it passively.
>> >> >>
>> >> >> So we should make "git gc --auto" be run on clone,
>> >> >
>> >> > There is no garbage after 'git clone'...
>> >>
>> >> "git gc" is really "git gc-or-create-indexes" these days.
>> >
>> > Because it happens to be convenient to create those indexes at
>> > gc-time.  But that should not be an excuse to run gc when by
>> > definition no gc is needed.
>>
>> Ah, I thought you just had an objection to the "gc" name being used for
>> non-gc stuff,
>
> But you thought right, I do have an objection against that.  'git gc'
> should, well, collect garbage.  Any non-gc stuff is already violating
> separation of concerns.

Ever since git-gc was added back in 30f610b7b0 ("Create 'git gc' to
perform common maintenance operations.", 2006-12-27) it has been
described as:

    git-gc - Cleanup unnecessary files and optimize the local repository

Creating these indexes like the commit-graph falls under "optimize the
local repository", and 3rd party tools (e.g. the repo tool doing this
came up on list recently) have been calling "gc --auto" with this
assumption.

>>  but if you mean we shouldn't do a giant repack right after
>> clone I agree.
>
> And, I also mean that since 'git clone' knows that there can't
> possibly be any garbage in the first place, then it shouldn't call 'gc
> --auto' at all.  However, since it also knows that there is a lot of
> new stuff, then it should create a commit-graph if enabled.

Is this something you think just because the tool isn't called
git-gc-and-optimzie, or do you think this regardless of what it's
called?

I don't see how splitting up the entry points for "detect if we need to
cleanup or optimize the repo" leaves us with a better codebase for the
reasons noted in
https://public-inbox.org/git/87pnwrgll2.fsf@xxxxxxxxxxxxxxxxxxx/