Re: [RFC 0/4] Shallow clones with on-demand fetch
- Date: Tue, 7 Mar 2017 04:42:47 -0500
- From: Jeff King <peff@xxxxxxxx>
- Subject: Re: [RFC 0/4] Shallow clones with on-demand fetch
On Mon, Mar 06, 2017 at 11:18:30AM -0800, Junio C Hamano wrote:
> Mark Thomas <markbt@xxxxxxxxxx> writes:
> > This is a proof-of-concept, so it is in no way complete. It contains a
> > few hacks to make it work, but these can be ironed out with a bit more
> > work. What I have so far is sufficient to try out the idea.
> Two things that immediately come to mind (which may or may not be
> real issues) are
> (1) What (if any) security model you have in mind.
> From object-confidentiality's point of view, this needs to be
> enabled only on a host that allows
> uploadpack.allowAnySHA1InWant but even riskier.
> From DoS point of view, you can make a short 40-byte request to
> cause the other side emit megabytes of stuff. I do not think
> it is a new problem (anybody can repeatedly request a clone of
> large stuff), but there may be new ramifications.
> (2) If the interface to ask just one object kills the whole idea
> due to roundtrip latency.
> You may want to be able to say "I want all objects reachable
> from this tree; please give me a packfile of needed objects
> assuming that I have all objects reachable from this other tree
> (or these other trees)".
Not just latency, but you also lose all of the benefits of delta
compression. So if I asked for:
git log -p -- foo.c
and git is going to fault in all of the various versions of foo.c over
time, it's _much_ more efficient to batch them into a single request, so
that the server can reuse on-disk deltas between the various versions.
That makes the transmission smaller, and it also makes it more likely
for the server to be able to transmit the bits straight off the disk
(rather than assembling each delta itself then zlib-compressing the
Similarly, there's a latency tension in just finding out whether an
object exists. When we call has_sha1_file() as part of a fetch, for
example, we really want to be able to answer it quickly. So you'd
probably want some mechanism to say "tell me the sha1, type, and size"
of each object I _could_ get via upload-file. The size of that data is
far from trivial for a large repository, but you're probably better off
getting it once than paying the latency cost to fetch it piecemeal.