Re: Simultaneous gc and repack
- Date: Thu, 13 Apr 2017 11:03:14 -0700
- From: Jacob Keller <jacob.keller@xxxxxxxxx>
- Subject: Re: Simultaneous gc and repack
On Thu, Apr 13, 2017 at 10:31 AM, David Turner <novalis@xxxxxxxxxxx> wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. But git repack doesn't respect this lock, so
> it's possible to have a repack running at the same time as a gc. This
> makes the gc sad when its packs are deleted out from under it with:
> "fatal: ./objects/pack/pack-$sha.pack cannot be accessed". Then it
> dies, leaving a large temp file hanging around.
> Does the following seem reasonable?
> 1. Make git repack, by default, check for a gc.pid file (using the same
> logic as git gc itself does).
> 2. Provide a --force option to git repack to ignore said check.
> 3. Make git gc provide that --force option when it calls repack under
> its own lock.
What about just making the code that calls repack today just call gc
instead? I guess it's more work if you don't strictly need it but..?
> This came up because Gitlab runs a repack after every N pushes and a gc
> after every M commits, where M >> N. Sometimes, when pushes come in
> rapidly, the repack catches the still-running gc and the above badness
> happens. At least, that's my understanding: I don't run our Gitlab
> servers, but I talked to the person who does and that's what he said.
> Of course, Gitlab could do its own locking, but the general approach
> seems like it would help other folks too.