Simultaneous gc and repack
- Date: Thu, 13 Apr 2017 13:31:38 -0400
- From: David Turner <novalis@xxxxxxxxxxx>
- Subject: Simultaneous gc and repack
Git gc locks the repository (using a gc.pid file) so that other gcs
don't run concurrently. But git repack doesn't respect this lock, so
it's possible to have a repack running at the same time as a gc. This
makes the gc sad when its packs are deleted out from under it with:
"fatal: ./objects/pack/pack-$sha.pack cannot be accessed". Then it
dies, leaving a large temp file hanging around.
Does the following seem reasonable?
1. Make git repack, by default, check for a gc.pid file (using the same
logic as git gc itself does).
2. Provide a --force option to git repack to ignore said check.
3. Make git gc provide that --force option when it calls repack under
its own lock.
This came up because Gitlab runs a repack after every N pushes and a gc
after every M commits, where M >> N. Sometimes, when pushes come in
rapidly, the repack catches the still-running gc and the above badness
happens. At least, that's my understanding: I don't run our Gitlab
servers, but I talked to the person who does and that's what he said.
Of course, Gitlab could do its own locking, but the general approach
seems like it would help other folks too.