Re: Simultaneous gc and repack
- Date: Thu, 13 Apr 2017 11:35:16 -0700
- From: Jacob Keller <jacob.keller@xxxxxxxxx>
- Subject: Re: Simultaneous gc and repack
On Thu, Apr 13, 2017 at 11:28 AM, David Turner <novalis@xxxxxxxxxxx> wrote:
> On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
>> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
>> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner
>> <novalis@xxxxxxxxxxx> wrote:
>> > > Git gc locks the repository (using a gc.pid file) so
>> > > that other gcs don't run concurrently. But git repack
>> > > doesn't respect this lock, so it's possible to have a
>> > > repack running at the same time as a gc. This makes
>> > > the gc sad when its packs are deleted out from under it
>> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
>> > > accessed". Then it dies, leaving a large temp file
>> > > hanging around.
>> > >
>> > > Does the following seem reasonable?
>> > >
>> > > 1. Make git repack, by default, check for a gc.pid file
>> > > (using the same logic as git gc itself does).
>> > > 2. Provide a --force option to git repack to ignore said
>> > > check. 3. Make git gc provide that --force option when
>> > > it calls repack under its own lock.
>> > What about just making the code that calls repack today
>> > just call gc instead? I guess it's more work if you don't
>> > strictly need it but..?
>> There are many scanerios where this does not achieve the
>> same thing. On the obvious side, gc does more than
>> repacking, but on the other side, repacking has many
>> switches that are not available via gc.
>> Would it make more sense to move the lock to repack instead
>> of to gc?
> Other gc operations might step on each other too (e.g. packing refs).
> That would be less bad (and less common), but it still seems worth
It sounds like your original solution would work, though I wouldn't
use "force" and I would either not document or document with "this is
only meant to be used by git-gc internally"