Web lists-archives.com

Re: How hard would it be to implement sparse fetching/pulling?

On 11/30/2017 3:03 PM, Jonathan Nieder wrote:
Hi Vitaly,

Vitaly Arbuzov wrote:

Found some details here: https://github.com/jeffhostetler/git/pull/3

Looking at commits I see that you've done a lot of work already,
including packing, filtering, fetching, cloning etc.
What are some areas that aren't complete yet? Do you need any help
with implementation?

That's a great question!  I've filed https://crbug.com/git/2 to track
this project.  Feel free to star it to get updates there, or to add
updates of your own.


As described at https://crbug.com/git/2#c1, currently there are three
patch series for which review would be very welcome.  Building on top
of them is welcome as well.  Please make sure to coordinate with
jeffhost@xxxxxxxxxxxxx and jonathantanmy@xxxxxxxxxx (e.g. through the
bug tracker or email).

One piece of missing functionality that looks intereseting to me: that
series batches fetches of the missing blobs involved in a "git
checkout" command:


But if doesn't batch fetches of the missing blobs involved in a "git
diff <commit> <commit>" command.  That might be a good place to get
your hands dirty. :)

Jonathan Tan added code in unpack-trees to bulk fetch missing blobs
before a checkout.  This is limited to the missing blobs needed for
the target commit.  We need this to make checkout seamless, but it
does mean that checkout may need online access.

I've also talked about a pre-fetch capability to bulk fetch missing
blobs in advance of some operation.  You could speed up the above
diff command or back-fill all the blobs I might need before going
offline for a while.

You can use the options that were added to rev-list to help with this.
For example:
    git rev-list --objects [--filter=<fs>] --missing=print <commit1>
    git rev-list --objects [--filter=<fs>] --missing=print <c1>..<c2>
And then pipe that into a "git fetch-pack --stdin".

You might experiment with this.