Web lists-archives.com

Re: [PATCH 07/10] t1404: demonstrate two problems with reference transactions




On Tue, Aug 29, 2017 at 1:20 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
> Currently, a loose reference is deleted even before locking the
> `packed-refs` file, let alone deleting any packed version of the
> reference. This leads to two problems, demonstrated by two new tests:
>
> * While a reference is being deleted, other processes might see the
>   old, packed value of the reference for a moment before the packed
>   version is deleted. Normally this would be hard to observe, but we
>   can prolong the window by locking the `packed-refs` file externally
>   before running `update-ref`, then unlocking it before `update-ref`'s
>   attempt to acquire the lock times out.
>
> * If the `packed-refs` file is locked so long that `update-ref` fails
>   to lock it, then the reference can be left permanently in the
>   incorrect state described in the previous point.
>
> In a moment, both problems will be fixed.
>
> Signed-off-by: Michael Haggerty <mhagger@xxxxxxxxxxxx>
> ---
> The first of the new tests is rather involved; it uses two background
> processes plus a foreground process that polls the value of a
> reference. But despite sensitivity to timing, I think it should be
> robust even in this broken state. Once the functionality being tested
> is fixed, this test should never produce false positives, though
> really bad timing (e.g., if it takes more than a second for
> `update-ref` to get going) could still lead to false negatives.
>
> Each of the new tests takes about a second to run because they
> simulate lock contention.
>
> If anybody has suggestions for better ways to test these things,
> please speak up :-)
>
>  t/t1404-update-ref-errors.sh | 71 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>
> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
> index c34ece48f5..752f83c377 100755
> --- a/t/t1404-update-ref-errors.sh
> +++ b/t/t1404-update-ref-errors.sh
> @@ -404,4 +404,75 @@ test_expect_success 'broken reference blocks indirect create' '
>         test_cmp expected output.err
>  '
>
> +test_expect_failure 'no bogus intermediate values during delete' '
> +       prefix=refs/slow-transaction &&
> +       # Set up a reference with differing loose and packed versions:
> +       git update-ref $prefix/foo $C &&
> +       git pack-refs --all &&
> +       git update-ref $prefix/foo $D &&
> +       git for-each-ref $prefix >unchanged &&
> +       # Now try to update the reference, but hold the `packed-refs` lock
> +       # for a while to see what happens while the process is blocked:
> +       : >.git/packed-refs.lock &&
> +       test_when_finished "rm -f .git/packed-refs.lock" &&
> +       {
> +               sleep 1 &&
> +               rm -f .git/packed-refs.lock &
> +       } &&
> +       pid1=$! &&
> +       {
> +               # Note: the following command is intentionally run in the
> +               # background. We extend the timeout so that `update-ref`
> +               # tries to acquire the `packed-refs` lock longer than it
> +               # takes the background process above to delete it:
> +               git -c core.packedrefstimeout=2000 update-ref -d $prefix/foo &
> +       } &&
> +       pid2=$! &&
> +       ok=true &&
> +       while kill -0 $pid2 2>/dev/null

    If sig is 0, then no signal is sent, but error checking is still
    performed; this can be used to check for the existence of a
    process ID or process group ID.

So the kill -0 is the idiomatic form of "while $pid2 is still alive"?
ignoring errors due to the dev/null redirection?

And due to the nature of this test we have to have a busy
loop, we cannot rate limit the cpu usage inside the loop
via some shorter sleeps, as ideally we want to observe
the ref at any time.

    In an ideal world this test would instruct the kernel to interrupt
    the executing program (update-ref) at certain events such as
    touching/writing/deleting files and in each interrupt we could
    inspect the file system in a read only fashion.


> +       do
> +               sha1=$(git rev-parse --verify --quiet $prefix/foo || echo undefined) &&
> +               case "$sha1" in
> +               $D)
> +                       # This is OK; it just means that nothing has happened yet.
> +                       : ;;
> +               undefined)
> +                       # This is OK; it means the deletion was successful.
> +                       : ;;
> +               $C)
> +                       # This value should never be seen. Probably the loose
> +                       # reference has been deleted but the packed reference
> +                       # is still there:
> +                       echo "$prefix/foo incorrectly observed to be C" &&
> +                       break
> +                       ;;
> +               *)
> +                       # WTF?
> +                       echo "$prefix/foo unexpected value observed: $sha1" &&
> +                       break
> +                       ;;
> +               esac
> +       done >out &&
> +       wait $pid1 &&
> +       wait $pid2 &&

oh, you use explicit pids here to check each exit code.

> If anybody has suggestions for better ways to test these things,
> please speak up :-)

I don't think I'd have a satisfactory answer to that, as the timing is inherent
to the things we test. In other software projects that are less low level, I
would have suggested to use a time/clock mock, which can be stopped
and then inspection can be performed at defined states.