Web lists-archives.com

Re: Revision walking, commit dates, slop

Derrick Stolee <stolee@xxxxxxxxx> writes:
> On 5/20/2019 7:02 AM, Jakub Narebski wrote:
>> Are there any blockers that prevent the switch to this
>> "generation number v2"?
>> - Is it a problem with insufficient data to choose the correct numbering
>>   as "generation number v2' (there can be only one)?
>> - Is it a problem with selected "generation number v2" being
>>   incompatibile with gen v2, and Git failing when new version of
>>   commit-graph is used instead of softly just not using commit-graph?
>> - Or is it something else?

Thanks for the explanation.

> The switch becomes a bit complicated.
> First, the plan was to version the commit-graph file to v2, and that would
> include a byte in the header for the "reachability index version" [1]. Since
> older clients fail hard on a newer file version, we are switching to instead
> including the reachability index version as a value in a more flexible 
> "metadata chunk" [2].

Ugh, that is bad.  The version number inn the commit-graph file format
was supposed to make it possible to easily change the format; now it
looks like we are stuck with workarounds until the released version that
dies on never commit-graph file format version innstead of softly not
utilizing the commit-graph file dies off.

How this issue got missed in review...

If we cannot change the format, all that is left is ading new chunks,
and changes that conform to commit-graph file version 1.  

>                      Using the generation number column for the corrected
> commit-date offsets (assuming we also guarantee the offset is strictly
> increasing from parent to child), these new values will be backwards-
> compatible _except_ for 'git commit-graph verify'.

O.K., so the "generation number v2 (legacy)" would be incremental and
backward-compatibile in use (though not in generation and validation).

Do I understand it correctly how it is calculated:

  corrected_date(C) = max(committer_date(C),
                          max_{P ∈ parents(C)}(corrected_date(P)) + 1)
  offset(C) = corrected_date(C) - committer_date(C)
  gen_v2(C) = max(offset(C), max_{P ∈ parents(C)}(gen_v2(P)) + 1) 

Do you have benchmark for this "monotonically offset corrected commit
date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
and https://github.com/derrickstolee/gen-test ?

Also, what would happen if different versions of Git tried to add to
commit-graph in interleaved way, either with rewrite or incremental?

> Second, we need to pull the reachability index value into a commit slab.

Is commit slab documented somewhere in Documentation/technical/, or just
in comments in commit-slab.h?

As I understand it, commit slab is Git-specific implementition of
inside-out object storage for commit data (i.e. struct of arrays instead
of array of structs), isn't it?  I wonder if using commit slab improves
cache utilization...

> The generation value is currently 32 bits, but we will expand that to
> 64 as it stores a timestamp. The commit struct is a bit bloated already,
> so this will reduce the required memory space even when not using the
> commit-graph. But, it requires some refactoring, including every place
> where we pass a "min_generation" needs to change type and name.

Could this be done with Coccinelle's spatch, similar to
e.g. contrib/coccinelle/commit.cocci?

> Third and finally, we need to calculate the new values and change the
> read logic to sum the offset and commit-date (when the metadata chunk
> says we are using corrected commit date).


> While none of this is _incredibly_ hard to do, it does require a bit
> of care. It's on my list to get to at some point, but making the file
> incremental is higher priority to me.

All right, I can understand that.

> [1] https://public-inbox.org/git/pull.112.git.gitgitgadget@xxxxxxxxx/
> [2] https://public-inbox.org/git/87h8acivkh.fsf@xxxxxxxxxxxxxxxxxxx/

Jakub Narębski