Web lists-archives.com

Re: Proposal for "fetch-any-blob Git protocol" and server design

On 03/15/2017 10:59 AM, Junio C Hamano wrote:
By "SHA-1s for which it wants blobs", you mean that "want" only
allows one exact blob object name?  I think it is necessary to
support that mode of operation as a base case, and it is a good
starting point.

When you know

 - you have a "partial" clone that initially asked to contain only
   blobs that are smaller than 10MB, and

 - you are now trying to do a "git checkout v1.0 -- this/directory"
   so that the directory is fully populated

instead of enumerating all the missing blobs from the output of
"ls-tree -r v1.0 this/directory" on separate "want" requests, you
may want to say "I want all the blobs that are not smaller than 10MB
in this tree object $(git rev-parse v1.0:this/directory)".

I am not saying that you should add something like this right away,
but I am wondering how you would extend the proposed system to do
so.  Would you add "fetch-size-limited-blob-in-tree-pack" that runs
parallel to "fetch-blob-pack" request?  Would you add a new type of
request packet "want-blob-with-expression" for fbp-request, which is
protected by some "protocol capability" exchange?

If the former, how does a client discover if a particular server
already supports the new "fetch-size-limited-blob-in-tree-pack"
request, so that it does not have to send a bunch of "want" request
by enumerating the blobs itself?  If the latter, how does a client
discover if a particular server's "fetch-blob-pack" already supports
the new "want-blob-with-expression" request packet?

I'm not sure if that use case is something we need to worry about (if you're downloading x * 10MB, uploading x * 50B shouldn't be a problem, I think), but if we want to handle that use case in the future, I agree that extending this system would be difficult.

The best way I can think of right now is for the client to send a fetch-blob-pack request with no "want" lines and at least one "want-tree" line, and then if there is an error (which will happen if the server is old, and therefore sees that there is not at least "want" line), to retry with the "want" lines. This allows us to add alternative ways of specifying blobs later (if we want to), but also means that upgrading a client without upgrading the corresponding server incurs a round-trip penalty.

Alternatively we could add rudimentary support for trees now and add filter-by-size later (so that such requests made to old servers will download extra blobs, but at least it works), but it still doesn't solve the general problem of specifying blobs by some other rule than its own SHA-1 or its tree's SHA-1.