Web lists-archives.com

Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases




On Thu, May 16, 2019 at 10:43:03AM +0900, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > I agree that the current implementation (and probably any sane
> > implementation) would not send us a delta if we have not provided any
> > haves. But this does mean that a malicious server could send a client
> > into an infinite loop.
> >
> > Pretty unlikely, but should we put some kind of circuit-breaker into the
> > client to ensure this?
> 
> That's a pretty good point.  Would it be suffice to have a new
> option to tell index-pack that fattens a thin pack and unpack-objects
> that expands objects in a small incoming packfile into loose objects
> that they are forbidden from on-demand fatching during this invocation,
> as it is an error for the packfile they are digesting to depend on a
> lazy objects?

Yeah, that's what I was thinking. The parent index-pack sets an
environment variable for "do not recurse to get partials" when it calls
fetch. But that said...

> > I dunno. Maybe we should just ignore it. It's a fundamental issue with
> > partial clones that we're going to have to fetch extra junk here anyway,
> 
> Would it be an option not to ask for a thin pack in the first place?

Yes, that seems much simpler. The flag is really "do not ask for a thin
pack, and do not pass --fix-thin to index-pack". And this recursion
should not kick in when --fix-thin is not in effect (I didn't check
Jonathan's patch to see whether that is the case, but I think that ought
to be the rule regardless of how we decide to address the recursion
issue).

One snag: I don't think unpack-objects understands --fix-thin. It just
looks in the object database and finds either a recently-written object
or one we already have, and it doesn't care about the difference.

For that matter, does unpack-objects need the same treatment here? I
guess not, because it does not disable fetch_if_missing in the same way.
I guess it is already susceptible to the infinite-recursion thing, then.
Or maybe not. We always use index-pack for promisor remotes. So even
though the _first_ fetch which yields REF_DELTA may be from a
non-promisor, any subsequent one would be. So we'd never recurse more
than once. So I think the emergent behavior does what we want. ;)

I worked up some patches a while ago to try to replace unpack-objects
with an index-pack mode that explodes objects (just so we could stop
keeping two almost-the-same code bases around, both of which are
security-critical as they take in untrusted objects, and one of which
clearly gets a lot more attention than the other). But I got hung up a
bit on their strategies for handling base objects. IIRC, the
unpack-objects one does things in a different order that's more
efficient for some cases, and I worried that somebody would care.

I can resurrect that if there's interest (though I do think by the
reasoning above that it's orthogonal to this particular patch series).

> > If we're willing to modify the format, one thing we _could_ do is have
> > the server communicate the expectations for each base. I.e., introduce a
> > new THIN_DELTA type that behaves exactly as a REF_DELTA, but with the
> > extra 1-bit of knowledge that the server knows it is not including the
> > base in the pack. I'm not sure how painful that retro-fitting would be.
> > It would need at least a new capability and options to pack-objects and
> > index-pack. We might be tight on bits in the packfile type field.
> 
> The type field is tight, but I wonder how much such a new
> representation would help.  Unless the receiving end blindly trusts
> what the sender says, there needs to be a logic to detect cyclic
> dependencies while following such a delta chain to lazy-fill
> promised objects on the receiving end anyway, no?

It would let the client know how the server expects it to handle each
delta. The client can then (without trusting the server) say:

  - this is REF_DELTA; the server claims that the base is in this pack,
    so I will not prefetch it. If it turns out not to be, then it is an
    error and I will reject the pack (and _not_ try to fetch it on
    demand)

  - this is THIN_DELTA; I will try to fetch it from the promisor remote,
    and not recurse (because the promisor fetch will not ask for a
    thin-pack). If it also turns out to be in the pack, so be it. We'll
    have two copies and will have wasted the server's bandwidth.

So the client gets to optimize by following the server's directions, but
the worst case for a lying server is not a big deal (it's just more
pessimal).

-Peff