Web lists-archives.com

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

On 10/4/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote:
I don't have time to polish this up for submission now, but here's a WIP
patch that implements this, highlights:

  * There's a gc.clone.autoDetach=false default setting which overrides
    gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a
    --cloning option to indicate this).

I'll repeat that it could make sense to do the same thing on clone _and_ fetch. Perhaps a "--post-fetch" flag would be good here to communicate that we just downloaded a pack from a remote.

  * A clone of say git.git with gc.writeCommitGraph=true looks like:

    Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done.
    Resolving deltas: 100% (188947/188947), done.
    Computing commit graph generation numbers: 100% (55210/55210), done.

This looks like good UX. Thanks for the progress here!

  * The 'git gc --auto' command also knows to (only) run the commit-graph
    (and space is left for future optimization steps) if general GC isn't
    needed, but we need "optimization":

    $ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto;
    Annotating commits in commit graph: 341229, done.
    Computing commit graph generation numbers: 100% (165969/165969), done.

Will this also trigger a full commit-graph rewrite on every 'git commit' command? Or is there some way we can compute the staleness of the commit-graph in order to only update if we get too far ahead? Previously, this was solved by relying on the auto-GC threshold.

  * The patch to gc.c looks less scary with -w, most of it is indenting
    the existing pack-refs etc. with a "!auto_gc || should_gc" condition.

  * I added a commit_graph_exists() exists function and only care if I
    get ENOENT for the purposes of this gc mode. This would need to be
    tweaked for the incremental mode Derrick talks about, but if we just
    set "should_optimize" that'll also work as far as gc --auto is
    concerned (e.g. on fetch, am etc.)

The incremental mode would operate the same as split-index, which means we will still look for .git/objects/info/commit-graph. That file may point us to more files.

+int commit_graph_exists(const char *graph_file)
+	struct stat st;
+	if (stat(graph_file, &st)) {
+		if (errno == ENOENT)
+			return 0;
+		else
+			return -1;
+	}
+	return 1;

This method serves a very similar purpose to generation_numbers_enabled(), except your method only cares about the file existing. It ignores information like `core.commitGraph`, which should keep us from doing anything with the commit-graph file if false.

Nothing about your method is specific to the commit-graph file, since you provide a filename as a parameter. It could easily be "int file_exists(const char *filename)".