Re: How hard would it be to implement sparse fetching/pulling?
- Date: Fri, 1 Dec 2017 10:24:46 -0800
- From: Jonathan Nieder <jrnieder@xxxxxxxxx>
- Subject: Re: How hard would it be to implement sparse fetching/pulling?
Jeff Hostetler wrote:
> On 11/30/2017 6:43 PM, Philip Oakley wrote:
>> The 'companies' problem is that it tends to force a client-server, always-on
>> on-line mentality. I'm also wanting the original DVCS off-line capability to
>> still be available, with _user_ control, in a generic sense, of what they
>> have locally available (including files/directories they have not yet looked
>> at, but expect to have. IIUC Jeff's work is that on-line view, without the
>> off-line capability.
>> I'd commented early in the series at [1,2,3].
> Yes, this does tend to lead towards an always-online mentality.
> However, there are 2 parts:
> [a] dynamic object fetching for missing objects, such as during a
> random command like diff or blame or merge. We need this
> regardless of usage -- because we can't always predict (or
> dry-run) every command the user might run in advance.
> [b] batch fetch mode, such as using partial-fetch to match your
> sparse-checkout so that you always have the blobs of interest
> to you. And assuming you don't wander outside of this subset
> of the tree, you should be able to work offline as usual.
> If you can work within the confines of [b], you wouldn't need to
> always be online.
Just to amplify this: for our internal use we care a lot about
disconnected usage working. So it is not like we have forgotten about
this use case.
> We might also add a part [c] with explicit commands to back-fill or
> alter your incomplete view of the ODB
Agreed, this will be a nice thing to add.
>> At its core, my idea was to use the object store to hold markers for the
>> 'not yet fetched' objects (mainly trees and blobs). These would be in a
>> known fixed format, and have the same effect (conceptually) as the
>> sub-module markers - they _confirm_ the oid, yet say 'not here, try
> We do have something like this. Jonathan can explain better than I, but
> basically, we denote possibly incomplete packfiles from partial clones
> and fetches as "promisor" and have special rules in the code to assert
> that a missing blob referenced from a "promisor" packfile is OK and can
> be fetched later if necessary from the "promising" remote.
> The main problem with markers or other lists of missing objects is
> that it has scale problems for large repos.
Any chance that we can get a design doc in Documentation/technical/
giving an overview of the design, with a brief "alternatives
considered" section describing this kind of thing?
E.g. some of the earlier descriptions like
may help as a starting point.