Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
- Date: Thu, 16 May 2019 11:26:46 -0700
- From: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
- Subject: Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases
> On Tue, May 14, 2019 at 02:10:55PM -0700, Jonathan Tan wrote:
> > Support for lazy fetching should still generally be turned off in
> > index-pack because it is used as part of the lazy fetching process
> > itself (if not, infinite loops may occur), but we do need to fetch the
> > REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
> > those are REF_DELTA themselves, because we do not send "have" when
> > making such fetches.)
> I agree that the current implementation (and probably any sane
> implementation) would not send us a delta if we have not provided any
> haves. But this does mean that a malicious server could send a client
> into an infinite loop.
> Pretty unlikely, but should we put some kind of circuit-breaker into the
> client to ensure this?
I thought of this - such a server could, but it seems to me that it
would be similar to a server streaming random bytes to us without
stopping (which is already possible).
> > To resolve this, prefetch all missing REF_DELTA bases before attempting
> > to resolve them. This both ensures that all bases are attempted to be
> > fetched, and ensures that we make only one request per index-pack
> > invocation, and not one request per missing object.
> Ah, but now things get more tricky.
> You are assuming that the server does not ever send a REF_DELTA unless
> the base object is not present in the pack (it would use OFS_DELTA
> instead). If we imagine a server which did, then there are two
> 1. We might pre-fetch a full copy of an object that we don't need.
> It's just that it's stored as a delta in the pack which we are
> currently indexing.
> 2. If we pre-fetch multiple objects, some of them may be REF_DELTAs
> against each other, leading to an infinite loop.
> Off the top of my head, I am pretty sure your assumption holds for all
> versions of Git that support delta-base-offset. But that feels a lot
> less certain to me. I could imagine an alternate server implementation,
> for example, that is gluing together packs and does not try hard to
> order the base before the delta, which would require it to use REF_DELTA
> instead of OFS_DELTA.
A cursory glance makes me think that REF_DELTA against a base object
also in the pack is already correctly handled. Right before the
invocation of conclude_pack() (which calls fix_unresolved_deltas(), the
function I modified), resolve_deltas() is invoked. The latter invokes
resolve_base() (directly or through threaded_second_pass()) which
invokes find_unresolved_deltas(), which invokes
find_unresolved_deltas_1(), which seems to handle both REF_DELTA and
Snipping the rest as I don't think we need to solve those if we can
handle REF_DELTA being against an object in a pack, but let me know if
you think that some of the points there still need to be addressed.