Web lists-archives.com

Re: [PATCH v2] rev-list: exclude promisor objects at walk time




On 2019.04.04 20:00, Jeff King wrote:
> On Thu, Apr 04, 2019 at 04:47:26PM -0700, Josh Steadmon wrote:
> 
> > > Did you (or anybody else) have any thoughts on the case where a given
> > > object is referred to both by a promisor and a non-promisor (and we
> > > don't have it)? That's the "shortcut" I think we're taking here: we
> > > would no longer realize that it's available via the promisor when we
> > > traverse to it from the non-promisor. I'm just not clear on whether that
> > > can ever happen.
> > 
> > I am not sure either. In process_blob() and process_tree() there are
> > additional checks for whether missing blobs/trees are promisor objects
> > using is_promisor_object()...  but if we call that we undo the
> > performance gains from this change.
> 
> Hmm. That might be a good outcome, though. If it never happens, we're
> fast. If it does happen, then our worst case is that we fall back to the
> current slower-but-more-thorough check. (And I think that happens with
> your patch, without us having to do anything further).
> 
> > > One other possible small optimization: we don't look up the object
> > > unless the caller asked to exclude promisors, which is good. But we
> > > could also keep a single flag for "is there a promisor pack at all?".
> > > When there isn't, we know there's no point in looking for the object.
> > [...]
> > I'm not necessarily opposed, but I'm leaning towards the "won't matter
> > much" side.
> > 
> > Where would such a flag live, in this case, and who would be responsible
> > for initializing it? I guess it would only matter for rev-list, so we
> > could initialize it in cmd_rev_list() if --exclude-promisor-objects is
> > passed?
> 
> The check is really something like:
> 
>   int have_promisor_pack() {
> 	for (p = packed_git; p; p = p->next) {
> 		if (p->pack_promisor)
> 			return 1;
> 	}
> 	return 0;
>   }
> 
> That could be lazily cached as a single bit, but it would need to be
> reset whenever we call reprepare_packed_git().
> 
> Let's just punt on it for now. I'm not convinced it would actually yield
> any benefit, unless we have a partial-clone repo that doesn't have any
> promisor packs (but then, I suspect whatever un-partial'd it should
> probably be resetting the partial flag in the config).
> 
> > > I didn't see any tweaks to the callers, which makes sense; we're already
> > > passing --exclude-promisor-objects as necessary. Which means by itself,
> > > this patch should be making things faster, right? Do you have timings to
> > > show that off?
> > 
> > Yeah, for a partial clone of a large-ish Android repo [1], we see the
> > connectivity check go from >180s to ~7s.
> 
> Those are nice numbers. :) Worth mentioning in the commit message, I
> think. How does it compare to your earlier patch? I'd hope they're about
> the same.

Thanks, will include them in the v3 commit message. Unfortunately it's
hard to compare against v1, because v1 doesn't call rev-list at all, and
thus we don't have a good measurement in the trace / trace2 output. The
rev-list timing has been pretty consistent at just a bit over 3 minutes,
but the overall clone takes anywhere from 12-20 minutes, so any
difference between v1 and v2 performance just gets lost in the noise. If
I get a chance on Monday I may go back to v1 and add some timing.