Re: How hard would it be to implement sparse fetching/pulling?
- Date: Mon, 4 Dec 2017 10:53:32 -0500
- From: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx>
- Subject: Re: How hard would it be to implement sparse fetching/pulling?
On 12/1/2017 1:24 PM, Jonathan Nieder wrote:
Jeff Hostetler wrote:
On 11/30/2017 6:43 PM, Philip Oakley wrote:
The 'companies' problem is that it tends to force a client-server, always-on
on-line mentality. I'm also wanting the original DVCS off-line capability to
still be available, with _user_ control, in a generic sense, of what they
have locally available (including files/directories they have not yet looked
at, but expect to have. IIUC Jeff's work is that on-line view, without the
I'd commented early in the series at [1,2,3].
Yes, this does tend to lead towards an always-online mentality.
However, there are 2 parts:
[a] dynamic object fetching for missing objects, such as during a
random command like diff or blame or merge. We need this
regardless of usage -- because we can't always predict (or
dry-run) every command the user might run in advance.
[b] batch fetch mode, such as using partial-fetch to match your
sparse-checkout so that you always have the blobs of interest
to you. And assuming you don't wander outside of this subset
of the tree, you should be able to work offline as usual.
If you can work within the confines of [b], you wouldn't need to
always be online.
Just to amplify this: for our internal use we care a lot about
disconnected usage working. So it is not like we have forgotten about
this use case.
We might also add a part [c] with explicit commands to back-fill or
alter your incomplete view of the ODB
Agreed, this will be a nice thing to add.
At its core, my idea was to use the object store to hold markers for the
'not yet fetched' objects (mainly trees and blobs). These would be in a
known fixed format, and have the same effect (conceptually) as the
sub-module markers - they _confirm_ the oid, yet say 'not here, try
We do have something like this. Jonathan can explain better than I, but
basically, we denote possibly incomplete packfiles from partial clones
and fetches as "promisor" and have special rules in the code to assert
that a missing blob referenced from a "promisor" packfile is OK and can
be fetched later if necessary from the "promising" remote.
The main problem with markers or other lists of missing objects is
that it has scale problems for large repos.
Any chance that we can get a design doc in Documentation/technical/
giving an overview of the design, with a brief "alternatives
considered" section describing this kind of thing?
Yeah, I'll start one. I have notes within the individual protocol
docs and man-pages, but no summary doc. Thanks!
E.g. some of the earlier descriptions like
may help as a starting point.