Web lists-archives.com

Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph

On 10/5/2018 3:21 PM, Jeff King wrote:
On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote:

My misunderstanding was that your proposed change to gc computes the
commit-graph in either of these two cases:

(1) The auto-GC threshold is met.

(2) There is no commit-graph file.

And what I hope to have instead of (2) is (3):

(3) The commit-graph file is "sufficiently behind" the tip refs.

This condition is intentionally vague at the moment. It could be that we
hint that (3) holds by saying "--post-fetch" (i.e. "We just downloaded a
pack, and it probably contains a lot of new commits") or we could create
some more complicated condition based on counting reachable commits with
infinite generation number (the number of commits not in the commit-graph

I like that you are moving forward to make the commit-graph be written more
frequently, but I'm trying to push us in a direction of writing it even more
often than your proposed strategy. We should avoid creating too many
orthogonal conditions that trigger the commit-graph write, which is why I'm
pushing on your design here.

Anyone else have thoughts on this direction?
Yes, I think measuring "sufficiently behind" is the right thing.
Everything else is a proxy or heuristic, and will run into corner cases.
E.g., I have some small number of objects and then do a huge fetch, and
now my commit-graph only covers 5% of what's available.

We know how many objects are in the graph already. And it's not too
expensive to get the number of objects in the repository. We can do the
same sampling for loose objects that "gc --auto" does, and counting
packed objects just involves opening up the .idx files (that can be slow
if you have a ton of packs, but you'd want to either repack or use a
.midx in that case anyway, either of which would help here).

So can we really just take (total_objects - commit_graph_objects) and
compare it to some threshold?

The commit-graph only stores the number of _commits_, not total objects.

Azure Repos' commit-graph does store the total number of objects, and that is how we trigger updating the graph, so it is not unreasonable to use that as a heuristic.