Web lists-archives.com

Re: [PATCH 1/3] commit-reach: implement get_reachable_subset

On 10/30/2018 11:35 PM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

+struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
+					 struct commit **to, int nr_to,
+					 int reachable_flag)
This is OR'ed into object.flags, and I somoehow expected to see it
as 'unsigned int', not a signed one.

Will do. Thanks.

+	struct commit **item;
+	struct commit *current;
+	struct commit_list *found_commits = NULL;
+	struct commit **to_last = to + nr_to;
+	struct commit **from_last = from + nr_from;
+	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+	int num_to_find = 0;
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	for (item = to; item < to_last; item++) {
+		struct commit *c = *item;
+		parse_commit(c);
+		if (c->generation < min_generation)
+			min_generation = c->generation;
+		if (!(c->object.flags & PARENT1)) {
+			c->object.flags |= PARENT1;
+			num_to_find++;
+		}
+	}
+	for (item = from; item < from_last; item++) {
+		struct commit *c = *item;
+		if (!(c->object.flags & PARENT2)) {
+			c->object.flags |= PARENT2;
+			parse_commit(c);
+			prio_queue_put(&queue, *item);
+		}
+	}
OK, we marked "to" with PARENT1 and counted them in num_to_find
without dups.  We also marked "from" with PARENT2 and threw them in
the "queue" without dups.

Mental note: the caller must guarantee that everybody reachable from
"to" and "from" have PARENT1 and PARENT2 clear.  This might deserve
to be in the comment before the function.

I'll put that in the header file.

OK, this all makes sense.  Unlike merge-base traversals, this does
not have to traverse from the "to" side at all, which makes it quite
simpler and straight-forward.

I do wonder if we can now reimplement in_merge_bases_many() in terms
of this helper, and if that gives us a better performance.  It asks
"is 'commit', i.e. a single 'to', an ancestor of, i.e. reachable
from, one of the 'references', i.e.  'from'"?

We could do this, but it does come with a performance hit when the following
are all true:

1. 'to' is not reachable from any 'from' commits.

2. The 'to' and 'from' commits are close in commit-date.

3. Generation numbers are not available, or the topology is skewed to have
   commits with high commit date and low generation number.

Since in_merge_bases_many() calls paint_down_to_common(), it has the same
issues with the current generation numbers. This can be fixed when we have
the next version of generation numbers available.

I'll make a note to have in_merge_bases_many() call get_reachable_subset()
conditionally (like the generation_numbers_available() trick in the --topo-order
series) after the generation numbers are settled and implemented.