Web lists-archives.com

Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases




> On Tue, May 14, 2019 at 02:10:55PM -0700, Jonathan Tan wrote:
> 
> > Support for lazy fetching should still generally be turned off in
> > index-pack because it is used as part of the lazy fetching process
> > itself (if not, infinite loops may occur), but we do need to fetch the
> > REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
> > those are REF_DELTA themselves, because we do not send "have" when
> > making such fetches.)
> 
> I agree that the current implementation (and probably any sane
> implementation) would not send us a delta if we have not provided any
> haves. But this does mean that a malicious server could send a client
> into an infinite loop.
> 
> Pretty unlikely, but should we put some kind of circuit-breaker into the
> client to ensure this?

I thought of this - such a server could, but it seems to me that it
would be similar to a server streaming random bytes to us without
stopping (which is already possible).

> > To resolve this, prefetch all missing REF_DELTA bases before attempting
> > to resolve them. This both ensures that all bases are attempted to be
> > fetched, and ensures that we make only one request per index-pack
> > invocation, and not one request per missing object.
> 
> Ah, but now things get more tricky.
> 
> You are assuming that the server does not ever send a REF_DELTA unless
> the base object is not present in the pack (it would use OFS_DELTA
> instead). If we imagine a server which did, then there are two
> implications:
> 
>   1. We might pre-fetch a full copy of an object that we don't need.
>      It's just that it's stored as a delta in the pack which we are
>      currently indexing.
> 
>   2. If we pre-fetch multiple objects, some of them may be REF_DELTAs
>      against each other, leading to an infinite loop.
> 
> Off the top of my head, I am pretty sure your assumption holds for all
> versions of Git that support delta-base-offset[1]. But that feels a lot
> less certain to me. I could imagine an alternate server implementation,
> for example, that is gluing together packs and does not try hard to
> order the base before the delta, which would require it to use REF_DELTA
> instead of OFS_DELTA.

A cursory glance makes me think that REF_DELTA against a base object
also in the pack is already correctly handled. Right before the
invocation of conclude_pack() (which calls fix_unresolved_deltas(), the
function I modified), resolve_deltas() is invoked. The latter invokes
resolve_base() (directly or through threaded_second_pass()) which
invokes find_unresolved_deltas(), which invokes
find_unresolved_deltas_1(), which seems to handle both REF_DELTA and
OFS_DELTA.

Snipping the rest as I don't think we need to solve those if we can
handle REF_DELTA being against an object in a pack, but let me know if
you think that some of the points there still need to be addressed.