Web lists-archives.com

BUG: Race condition due to reflog expiry in "gc"




I'm still working on fixing a race condition I encountered in "gc"
recently, but am not 100% sure of the fix. So I thought I'd send a
braindump of what I have so far in case it jolts any memories.

The problem is that when we "gc" we run "reflog expire --all". This
iterates over the reflogs, and takes a *.lock for each reference.

It'll fail intermittendly in two ways:

 1. If something is concurrently committing to the repo it'll fail
    because we for a tiny amount of time hold a HEAD.lock file, so HEAD
    can't be updated.

 2. On a repository that's just being "git fetch"'d by some concurrent
    process the "gc" will fail, because the ref's SHA1 has changed,
    which we inspect as we aquire the lock.

To reproduce this have a repo getting a bunch of commits and "fetch"
from it in a loop[A]. Then try to "gc" in it (or "reflog expire
--all"). You'll get case #2 above (#1 can also be manually reproduce):

    $ git gc; echo $?
    error: cannot lock ref 'refs/remotes/origin/HEAD': ref 'refs/remotes/origin/HEAD' is at d13c708ae0435c88c8f7580442b70df9a40224d0 but expected 7fb5b7dd9988d040a9d29c0456874472bea2b439
    error: cannot lock ref 'refs/remotes/origin/master': ref 'refs/remotes/origin/master' is at d13c708ae0435c88c8f7580442b70df9a40224d0 but expected 7fb5b7dd9988d040a9d29c0456874472bea2b439
    fatal: failed to run reflog
    128

This behavior is due to a combination of:

 - 62aad1849f ("gc --auto: do not lock refs in the background", 2014-05-25)
 - f3b661f766 ("expire_reflog(): use a lock_file for rewriting the
   reflog file", 2014-12-12)

Discussion around those changes:

    https://public-inbox.org/git/1400978309-25235-1-git-send-email-pclouds@xxxxxxxxx/
    https://public-inbox.org/git/CAGZ79kaPzMPcxMqsTeW5LjYNc7LbpjHE5eBPHWKvUkBU3NvGJw@xxxxxxxxxxxxxx/

I haven't yet found a reason for why we'd intrinsically need this lock,
and indeed this doesn't produce corruption AFAICT:

    diff --git a/refs/files-backend.c b/refs/files-backend.c
    index ef053f716c..ffb9fbbf3a 100644
    --- a/refs/files-backend.c
    +++ b/refs/files-backend.c
    @@ -3043,7 +3043,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
     	if (!lock) {
     		error("cannot lock ref '%s': %s", refname, err.buf);
     		strbuf_release(&err);
    -		return -1;
    +		return 0;
     	}
     	if (!refs_reflog_exists(ref_store, refname)) {
     		unlock_ref(lock);

But that just "solves" #2, not #1, and also no-oping the OID check in
verify_lock() still has all tests pass, so we're likely missing the
relevant race condition tests.

A.

    (
        cd /tmp &&
        rm -rf git &&
        git init git &&
        cd git &&
        while true
        do
            head -c 10 /dev/urandom | hexdump >out &&
            git add out &&
            git commit -m"out"
        done
    )

    (
        cd /tmp &&
        rm -rf git-clone &&
        git clone file:///tmp/git git-clone &&
        cd git-clone &&
        while true
        do
            git reset --hard &&
            git pull
        done
    )