Re: How hard would it be to implement sparse fetching/pulling?
- Date: Sat, 2 Dec 2017 16:59:27 -0000
- From: "Philip Oakley" <philipoakley@xxxxxxx>
- Subject: Re: How hard would it be to implement sparse fetching/pulling?
Thanks for the outline. It has help clarify some points and see the very
The one thing I wasn't clear about is the "promised" objects/remote. Is that
"promisor" remote a fixed entity, or could it be one of many remotes that
could be a "provider"? (sort of like fetching sub-modules...)
From: "Jonathan Nieder" <jrnieder@xxxxxxxxx>
Sent: Friday, December 01, 2017 2:51 AM
Vitaly Arbuzov wrote:
I think it would be great if we high level agree on desired user
experience, so let me put a few possible use cases here.
I think one thing this thread is pointing to is a lack of overview
documentation about how the 'partial clone' series currently works.
The basic components are:
1. extending git protocol to (1) allow fetching only a subset of the
objects reachable from the commits being fetched and (2) later,
going back and fetching the objects that were left out.
We've also discussed some other protocol changes, e.g. to allow
obtaining the sizes of un-fetched objects without fetching the
2. extending git's on-disk format to allow having some objects not be
present but only be "promised" to be obtainable from a remote
repository. When running a command that requires those objects,
the user can choose to have it either (a) error out ("airplane
mode") or (b) fetch the required objects.
It is still possible to work fully locally in such a repo, make
changes, get useful results out of "git fsck", etc. It is kind of
similar to the existing "shallow clone" feature, except that there
is a more straightforward way to obtain objects that are outside
the "shallow" clone when needed on demand.
3. improving everyday commands to require fewer objects. For
example, if I run "git log -p", then I way to see the history of
most files but I don't necessarily want to download large binary
files just to print 'Binary files differ' for them.
And by the same token, we might want to have a mode for commands
like "git log -p" to default to restricting to a particular
directory, instead of downloading files outside that directory.
There are some fundamental changes to make in this category ---
e.g. modifying the index format to not require entries for files
outside the sparse checkout, to avoid having to download the
trees for them.
The overall goal is to make git scale better.
The existing patches do (1) and (2), though it is possible to do more
in those categories. :) We have plans to work on (3) as well.
These are overall changes that happen at a fairly low level in git.
They mostly don't require changes command-by-command.