Web lists-archives.com

Re: How hard would it be to implement sparse fetching/pulling?






On 12/1/2017 1:24 PM, Jonathan Nieder wrote:
Jeff Hostetler wrote:
On 11/30/2017 6:43 PM, Philip Oakley wrote:

The 'companies' problem is that it tends to force a client-server, always-on
on-line mentality. I'm also wanting the original DVCS off-line capability to
still be available, with _user_ control, in a generic sense, of what they
have locally available (including files/directories they have not yet looked
at, but expect to have. IIUC Jeff's work is that on-line view, without the
off-line capability.

I'd commented early in the series at [1,2,3].

Yes, this does tend to lead towards an always-online mentality.
However, there are 2 parts:
[a] dynamic object fetching for missing objects, such as during a
     random command like diff or blame or merge.  We need this
     regardless of usage -- because we can't always predict (or
     dry-run) every command the user might run in advance.
[b] batch fetch mode, such as using partial-fetch to match your
     sparse-checkout so that you always have the blobs of interest
     to you.  And assuming you don't wander outside of this subset
     of the tree, you should be able to work offline as usual.
If you can work within the confines of [b], you wouldn't need to
always be online.

Just to amplify this: for our internal use we care a lot about
disconnected usage working.  So it is not like we have forgotten about
this use case.

We might also add a part [c] with explicit commands to back-fill or
alter your incomplete view of the ODB

Agreed, this will be a nice thing to add.

[...]
At its core, my idea was to use the object store to hold markers for the
'not yet fetched' objects (mainly trees and blobs). These would be in a
known fixed format, and have the same effect (conceptually) as the
sub-module markers - they _confirm_ the oid, yet say 'not here, try
elsewhere'.

We do have something like this.  Jonathan can explain better than I, but
basically, we denote possibly incomplete packfiles from partial clones
and fetches as "promisor" and have special rules in the code to assert
that a missing blob referenced from a "promisor" packfile is OK and can
be fetched later if necessary from the "promising" remote.

The main problem with markers or other lists of missing objects is
that it has scale problems for large repos.

Any chance that we can get a design doc in Documentation/technical/
giving an overview of the design, with a brief "alternatives
considered" section describing this kind of thing?

Yeah, I'll start one.  I have notes within the individual protocol
docs and man-pages, but no summary doc.  Thanks!


E.g. some of the earlier descriptions like
  https://public-inbox.org/git/20170915134343.3814dc38@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
  https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@xxxxxxxxxx/
  https://public-inbox.org/git/20170113155253.1644-1-benpeart@xxxxxxxxxxxxx/
may help as a starting point.

Thanks,
Jonathan