Re: How de-duplicate similar repositories with alternates
- Date: Tue, 4 Dec 2018 02:06:02 -0500
- From: Jeff King <peff@xxxxxxxx>
- Subject: Re: How de-duplicate similar repositories with alternates
On Thu, Nov 29, 2018 at 10:55:49AM -0800, Stefan Beller wrote:
> On Thu, Nov 29, 2018 at 7:00 AM Ævar Arnfjörð Bjarmason
> <avarab@xxxxxxxxx> wrote:
> > A co-worker asked me today how space could be saved when you have
> > multiple checkouts of the same repository (at different revs) on the
> > same machine. I said since these won't block-level de-duplicate well
> > one way to do this is with alternates.
> Another way is to use git-worktree, which would solve the gc issues
> mentioned below?
> I view alternates as a historic artefact as the deduping
> of objects client side can be done using worktrees, and on the
> serverside - I think - most of the git hosters use namespaces
> and put a fork network into the same repository and use pack islands.
Nope, we definitely use alternates. The ref namespace support in Git is
not nearly complete enough to run a modern hosting site; it only kicks
in for upload-pack and receive-pack. Other commands (e.g., rev-list to
traverse for a history-view page) have no support at all. So we share
object storage, but not ref storage.
In theory the caller could namespace requests (e.g., the user asks for
"foo", the web site feeds "refs/forks/$id/refs/heads/foo" to git). But
any bugs are a lot more likely to lead to security problems (oops, you
accidentally wrote into somebody else's fork!). And ref storage has
traditionally been a sore point for scaling, so giving each fork its own
repo and refs helps break that up.
By contrast, object storage is pretty easy to share. It scales
reasonably well, and the security model is much simpler due to the
immutable nature of object names.