Web lists-archives.com

Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases




> On Thu, May 16, 2019 at 11:26:46AM -0700, Jonathan Tan wrote:
> 
> > > Pretty unlikely, but should we put some kind of circuit-breaker into the
> > > client to ensure this?
> > 
> > I thought of this - such a server could, but it seems to me that it
> > would be similar to a server streaming random bytes to us without
> > stopping (which is already possible).
> 
> True. I was thinking mainly of the infinite-redirection protections we
> put in place for https. But I agree that in general, since we don't have
> inherent limits on the size of workloads, that servers can already troll
> clients pretty hard in a variety of ways.
> 
> So I could go either way, though I do think it makes sense for on-demand
> fetches for partial clones to avoid asking for thin packs as a general
> principle.

This should not be a problem since fetch-pack can already know that
we're doing an on-demand fetch (args->no_dependents), so we should be
able to either plumb a "no-thin-pack" arg in the same way or rename
args->no_dependents to also encompass the no-thin-pack option. But this
can be done separately from this patch set, I think.

> As a matter of fact, should partial clones _always_ avoid
> asking for thin packs?  That would make this issue go away entirely.
> 
> Sometimes it would be more efficient (we do not have to get an extra
> base object just to resolve the delta we needed) but sometimes worse (if
> we did actually have the base, it's a win). Whether it's a win would
> depend on the "hit" rate, and I suspect that is heavily dependent on
> workload characteristics (what kind of filtering is in use, are we
> topping up in a non-partial way, etc).

I think it's best if we still allow servers to serve thin packs. For
example, if we're excluding only large blobs, clients would still want
servers to be able to delta against blobs that they have.

> Right, REF_DELTA is definitely correctly handled currently, and I don't
> think that would break with your patch. It's just that your patch would
> introduce a bunch of extra traffic as we request bases separately that
> are already in the pack.

Ah...I see. For this problem, I think that it can be solved with the
"if (objects[d->obj_no].real_type != OBJ_REF_DELTA)" check that the
existing code uses before calling read_object(). I'll include this in
the next reroll if any other issue comes up.