Web lists-archives.com

Re: Simultaneous gc and repack




On Thu, Apr 13, 2017 at 10:31 AM, David Turner <novalis@xxxxxxxxxxx> wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. But git repack doesn't respect this lock, so
> it's possible to have a repack running at the same time as a gc.  This
> makes the gc sad when its packs are deleted out from under it with:
> "fatal: ./objects/pack/pack-$sha.pack cannot be accessed".  Then it
> dies, leaving a large temp file hanging around.
>
> Does the following seem reasonable?
>
> 1. Make git repack, by default, check for a gc.pid file (using the same
> logic as git gc itself does).
> 2. Provide a --force option to git repack to ignore said check.
> 3. Make git gc provide that --force option when it calls repack under
> its own lock.
>

What about just making the code that calls repack today just call gc
instead? I guess it's more work if you don't strictly need it but..?

Thanks,
Jake

> This came up because Gitlab runs a repack after every N pushes and a gc
> after every M commits, where M >> N.  Sometimes, when pushes come in
> rapidly, the repack catches the still-running gc and the above badness
> happens.  At least, that's my understanding: I don't run our Gitlab
> servers, but I talked to the person who does and that's what he said.
>
> Of course, Gitlab could do its own locking, but the general approach
> seems like it would help other folks too.