Web lists-archives.com

Re: [PATCH] clone: do faster object check for partial clones




> For partial clones, doing a full connectivity check is wasteful; we skip
> promisor objects (which, for a partial clone, is all known objects), and
> excluding them all from the connectivity check can take a significant
> amount of time on large repos.

Instead of "excluding them all", I would word this as "enumerating them
all so that we can exclude them" - the enumerating is the slow part, not
the excluding (which actually makes things faster).

> +	if (opt->check_refs_only) {
> +		/*
> +		 * For partial clones, we don't want to walk the full commit
> +		 * graph because we're skipping promisor objects anyway. We
> +		 * should just check that objects referenced by wanted refs were
> +		 * transferred.

The enumeration of promisor objects to be excluded is done through
for_each_packed_object() (see is_promisor_object()), not through walking
the commit graph. Maybe reword this comment to be similar to what I've
suggested for the commit message.

> @@ -46,6 +46,14 @@ struct check_connected_options {
>  	 * during a fetch.
>  	 */
>  	unsigned is_deepening_fetch : 1;
> +
> +	/*
> +	 * If non-zero, only check the top-level objects referenced by the
> +	 * wanted refs (passed in as cb_data). This is useful for partial
> +	 * clones, where this can be much faster than excluding all promisor
> +	 * objects prior to walking the commit graph.
> +	 */
> +	unsigned check_refs_only : 1;
>  };

Same enumerating vs excluding comment as before.

Aside from that: thinking from scratch, we want something that tells
check_connected() to avoid anything that enumerates the list of promised
objects, since the objects that we're checking are all promisor objects,
and thus any outgoing links are automatically promised. I would include
some of this explanation in the comment, but in the interest of trying
to avoid a bikeshedding discussion, I don't consider this necessary.