Web lists-archives.com

Re: [PATCH 08/10] files_ref_store: use a transaction to update packed refs

On Tue, Aug 29, 2017 at 10:20:32AM +0200, Michael Haggerty wrote:

> First, the old code didn't obtain the `packed-refs` lock until
> `files_transaction_finish()`. This means that a failure to acquire the
> `packed-refs` lock (e.g., due to contention with another process)
> wasn't detected until it was too late (problems like this are supposed
> to be detected in the "prepare" phase). The new code acquires the
> `packed-refs` lock in `files_transaction_prepare()`, the same stage of
> the processing when the loose reference locks are being acquired,
> removing another reason why the "prepare" phase might succeed and the
> "finish" phase might nevertheless fail.

That means we're holding the packed-refs lock for a slightly longer
period. I think this could mean worse lock contention between otherwise
unrelated transactions over the packed-refs file. I wonder if the
lock-retry timeout might need to be increased to accommodate this. On
the other hand, it looks like we take it after getting the individual
locks, which I'd think would be the expensive part.

Are there other callers who take the packed-refs and individual locks in
the reverse order? I think git-pack-refs might. Could we "deadlock" with
pack-refs? There are timeouts so it would resolve itself quickly, but I
wonder if we'd have a case that would deadlock-until-timeout that
otherwise could succeed.

I guess such a deadlock would exist today, as well, if we take the
packed-refs lock before giving up the individual loose ref locks.

> Second, the old code deleted the loose version of a reference before
> deleting any packed version of the same reference. This left a moment
> when another process might think that the packed version of the
> reference is current, which is incorrect. (Even worse, the packed
> version of the reference can be arbitrarily old, and might even point
> at an object that has since been garbage-collected.)
> Third, if a reference deletion fails to acquire the `packed-refs` lock
> altogether, then the old code might leave the repository in the
> incorrect state (possibly corrupt) described in the previous
> paragraph.

And to think I had the hubris to claim a few weeks ago that we had
probably weeded out all of the weird packed-refs inconsistency and
race-condition bugs. :)

Good find.