Web lists-archives.com

Re: Revision walking, commit dates, slop




SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:
> On Sat, May 18, 2019 at 01:17:06PM +0900, Mike Hommey wrote:
>> On Sat, May 18, 2019 at 12:58:28PM +0900, Mike Hommey wrote:
>>> On Sat, May 18, 2019 at 03:50:05AM +0200, SZEDER Gábor wrote:
>>>> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:
[...]
>>>>> There are established corner cases, where in a repo where commit dates
>>>>> are not monotonically increasing, revision walking can go horribly
>>>>> wrong. This was discussed in the past in e.g.
>>>>> https://public-inbox.org/git/20150521061553.GA29269@xxxxxxxxxxxx/
>>>>> 
>>>>> The only (simple) workable way, given the current algorithm, to get an
>>>>> accurate view off rev-list is to essentially make slop infinite. This
>>>>> works fine, at the expense of runtime.
>>>>> 
>>>>> Now, ignoring any modification for the above, I'm hitting another corner
>>>>> case in some other "weird" history, where I have 500k commits all with
>>>>> the same date. With such a commit dag, something as trivial as
>>>>> `git rev-list HEAD~..HEAD` goes through all commits from the root commit
>>>>> to HEAD, which takes multiple seconds, when the (obvious) output is one
>>>>> commit.
[...]
>>>> All the above is without commit-graph, I presume?  If so, then you
>>>> should give it a try, as it might bring immediate help in your
>>>> pathological repo.  With 5k commit in the same second (enforced via
>>>> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:
>>>> 
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.069
>>>>   $ git commit-graph write --reachableComputing commit graph generation
>>>>   numbers: 100% (5000/5000), done.
>>>>   $ best-of-five -q git rev-list HEAD~..HEAD
>>>>   0.004
>>> 
>>> I'm not observing any difference from using commit-graph, whether in
>>> time or in the number of commits that are looked at in limit_list().
>> 
>> -c core.commitGraph=true does make a difference in time, but not in the
>> number of commits looked at in limit_list(). So it's only faster because
>> each iteration of the loop is faster. It means it's still dependent on
>> the depth of the dag, and the larger the repo will grow, the slower it
>> will get.
>
> Oh, indeed.  Well, at least you'll waste about an order of magnitude
> less processor time until you figure out how to fix it :)
>
> Btw, once upon a time this was fast, but it became slow with commit
> c19d1b4e84 (Fix revision walk for commits with the same dates,
> 2013-03-22).

This might be related to the fact, that with generation numbers v1
(i.e. topological levels) there were cases when using those generation
numbers for ordering caused performance regressions (e.g. for some parts
of Linux kernel history).

There was an idea of replacing them with generation numbers v2 [1],
which if I remember correctly was chosen to be corrected date (that is,
committer date or one more than maximum of dates of parents).  This is
incompatibile change; it turned out however that while commit-graph
format is versioned, unfortunately Git fails hard if commit-graph
version is too new, instead of falling back to not using commit graph.

[1]: https://github.com/derrickstolee/gen-test

Stolee, could you tell us what is the current status on generation
numbers v2?  Thanks in advance.

Best,
--
Jakub Narębski