Web lists-archives.com

Re: [PATCH 0/3] Make add_missing_tags() linear




On 10/31/2018 2:04 AM, Elijah Newren wrote:
> On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
>>
>> As reported earlier [1], the add_missing_tags() method in remote.c has
>> quadratic performance. Some of that performance is curbed due to the
>> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
>> help users without a commit-graph, and it can still be painful if that
>> cutoff is sufficiently low compared to the tags we are using for
>> reachability testing.
>>
>> Add a new method in commit-reach.c called get_reachable_subset() which does
>> a many-to-many reachability test. Starting at the 'from' commits, walk until
>> the generation is below the smallest generation in the 'to' commits, or all
>> 'to' commits have been discovered. This performs only one commit walk for
>> the entire add_missing_tags() method, giving linear performance in the worst
>> case.
>>
>> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
>> works independently of its application in add_missing_tags().
>
> On the original repo where the topic was brought up, with commit-graph
> NOT turned on and using origin/master, I see:
>
> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]       test5 -> test5
>
> real 1m20.081s
> user 1m19.688s
> sys 0m0.292s
>
> Merging this series in, I now get:
>
> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]       test5 -> test5
>
> real 0m2.857s
> user 0m2.580s
> sys 0m0.328s
>
> which provides a very nice speedup.
>
> Oddly enough, if I _also_ do the following:
> $ git config core.commitgraph true
> $ git config gc.writecommitgraph true
> $ git gc
>
> then my timing actually slows down just slightly:
> $ time git push --follow-tags --dry-run /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]          test5 -> test5
>
> real 0m3.027s
> user 0m2.696s
> sys 0m0.400s

So you say that the commit-graph is off in the 2.8s case, but not here
in the 3.1s case? I would expect _at minimum_ that the cost of parsing
commits would have a speedup in the commit-graph case.  There may be
something else going on here, since you are timing a `push` event that
is doing more than the current walk.

> (run-to-run variation seems pretty consistent, < .1s variation, so
> this difference is just enough to notice.)  I wouldn't be that
> surprised if that means there's some really old tags with very small
> generation numbers, meaning it's not gaining anything in this special
> case from the commit-graph, but it does pay the cost of loading the
> commit-graph.

While you have this test environment, do you mind applying the diff
below and re-running the tests? It will output a count for how many
commits are walked by the algorithm. This should help us determine if
this is another case where generation numbers are worse than commit-date,
or if there is something else going on. Thanks!

-->8--

>From 2115e7dcafb2770455b7b4793f90edc2254bad97 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
Date: Wed, 31 Oct 2018 11:40:50 +0000
Subject: [PATCH] DO-NOT-MERGE: count commits in get_reachable_subset

Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
---
 commit-reach.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index a98532ecc8..b512461cf7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -700,6 +700,7 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	struct commit **from_last = from + nr_from;
 	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 	int num_to_find = 0;
+	int count = 0;
 
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
@@ -729,6 +730,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	while (num_to_find && (current = prio_queue_get(&queue)) != NULL) {
 		struct commit_list *parents;
 
+		count++;
+
 		if (current->object.flags & PARENT1) {
 			current->object.flags &= ~PARENT1;
 			current->object.flags |= reachable_flag;
@@ -755,6 +758,8 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 	clear_commit_marks_many(nr_to, to, PARENT1);
 	clear_commit_marks_many(nr_from, from, PARENT2);
 
+	fprintf(stderr, "count: %d\n", count);
+
 	return found_commits;
 }
 
-- 
2.19.1.542.gc4df23f792