Web lists-archives.com

Re: [PATCH 0/3] Make add_missing_tags() linear




On Wed, Oct 31, 2018 at 5:05 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 10/31/2018 2:04 AM, Elijah Newren wrote:
> > On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@xxxxxxxxx> wrote:
> >>
> >> As reported earlier [1], the add_missing_tags() method in remote.c has
> >> quadratic performance. Some of that performance is curbed due to the
> >> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
> >> help users without a commit-graph, and it can still be painful if that
> >> cutoff is sufficiently low compared to the tags we are using for
> >> reachability testing.
> >>
> >> Add a new method in commit-reach.c called get_reachable_subset() which does
> >> a many-to-many reachability test. Starting at the 'from' commits, walk until
> >> the generation is below the smallest generation in the 'to' commits, or all
> >> 'to' commits have been discovered. This performs only one commit walk for
> >> the entire add_missing_tags() method, giving linear performance in the worst
> >> case.
> >>
> >> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
> >> works independently of its application in add_missing_tags().
> >
> > On the original repo where the topic was brought up, with commit-graph
> > NOT turned on and using origin/master, I see:
> >
> > $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> > To /home/newren/repo-mirror
> >  * [new branch]       test5 -> test5
> >
> > real 1m20.081s
> > user 1m19.688s
> > sys 0m0.292s
> >
> > Merging this series in, I now get:
> >
> > $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> > To /home/newren/repo-mirror
> >  * [new branch]       test5 -> test5
> >
> > real 0m2.857s
> > user 0m2.580s
> > sys 0m0.328s
> >
> > which provides a very nice speedup.
> >
> > Oddly enough, if I _also_ do the following:
> > $ git config core.commitgraph true
> > $ git config gc.writecommitgraph true
> > $ git gc
> >
> > then my timing actually slows down just slightly:
> > $ time git push --follow-tags --dry-run /home/newren/repo-mirror
> > To /home/newren/repo-mirror
> >  * [new branch]          test5 -> test5
> >
> > real 0m3.027s
> > user 0m2.696s
> > sys 0m0.400s
>
> So you say that the commit-graph is off in the 2.8s case, but not here
> in the 3.1s case? I would expect _at minimum_ that the cost of parsing
> commits would have a speedup in the commit-graph case.  There may be
> something else going on here, since you are timing a `push` event that
> is doing more than the current walk.
>
> > (run-to-run variation seems pretty consistent, < .1s variation, so
> > this difference is just enough to notice.)  I wouldn't be that
> > surprised if that means there's some really old tags with very small
> > generation numbers, meaning it's not gaining anything in this special
> > case from the commit-graph, but it does pay the cost of loading the
> > commit-graph.
>
> While you have this test environment, do you mind applying the diff
> below and re-running the tests? It will output a count for how many
> commits are walked by the algorithm. This should help us determine if
> this is another case where generation numbers are worse than commit-date,
> or if there is something else going on. Thanks!

I can do that, but wouldn't you want a similar patch for the old
get_merge_bases_many() in order to compare?  Does an absolute number
help by itself?
It's going to have to be tomorrow, though; not enough time tonight.