Re: git branch performance problem?
On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote:
> The way I solved that, was to have both repositories pointing to each
> other, using alternates.
Ouch. Double un-good. Not a good idea. Especially not if you do
development in both and pull and push between them.
What will happen is that if you do alternates pointing both ways, you
basically end up having a "shared pool of objects". So it's pretty much
equivalent to just using a shared object directory, and it has *exactly*
the same issues with object reachability and references: you have a shared
pool of objects, but you only ever see *one* set of references, so garbage
collection cannot work - because it will always see just a subset of the
real references, while it sees essentially all objects.
> could it be that GC does not handle cyclic alternates correctly?
It's not about cyclic per se: it's about the fact that GC will do garbage
collection based on reachability with the local references.
Which is normally fine. It's normally fine, because the object tree is
"local" too. But when doing alternates:
- the tree that is being used as an alternate *has* to be totally stable.
It must *never* have been re-based, or have any GC'able objects in the
first place. IOW, doing a "git gc" on it will be safe, because there is
no way any objects that the other alternate depends on could be pruned.
- You definitely must *not* do a two-way alternate, because that violates
another rule: the rule that the "alternate base" (which is now *both*
of the repositories) is self-sufficient. Since they both point to each
other, there's no way to know whether they are self-sufficient or not:
they may be re-using each others objects *and* packs!
And in the above, the "*and* packs" is important, and probably the cause
of your problems. Because "git repack -a -d -l" (which is what "git gc"
does) will always gather up any loose objects even from remote sites, but
the "-l" means that it will not do so for alternate packed objects.
So what happens is that if one of the repositories can reach some object
that is in a pack in the other repository, "git gc" will still *leave* it
dependent on a pack in the other repository. But maybe that object isn't
even reachable in the other repo any more (for whatever reason - a rebase,
whatever), then when you repack the other repository, now all the packs
will be replaced by one new pack - and the one new pack will only contain
the objects reachable from the other repo.
IOW: alternates are dangerous. A shared object directory is dangerous. You
should basically only do it under very controlled circumstances, and
otherwise you should use either hardlinks or if you want added safety,
totally separate repositories.
Basically, here's an example of badness, with A and B being repos that
point to each other.
- do something in A
- pull it into B - this leaves the objects in A, because of the
- rebase A
- "git gc" in A: this removes unreachable objects from A, and now B is
So the rule really is: never *ever* do anything but fast-forward in a repo
that is an alternate for another one. If you do a circular link, I think
it's still safe if you follow that rule, but now obviously the rule holds
for *both* repos (and quite frankly, I'd worry so much that I'd never do
it even then).
There should be another rule too: git on its own is not a backup system.
You can use git *as* a backup system, but you need to do so by mirroring
the whole repository, and not on the same disk.
(ie, for me, git *is* a backup system, but that's only because I push my
repos to other sites - a single git repo on its own has zero redundancy)
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html