Re: [PATCH 3/6] fetch-pack: in protocol v2, enqueue commons first
- Date: Tue, 5 Jun 2018 16:30:26 -0700
- From: Jonathan Nieder <jrnieder@xxxxxxxxx>
- Subject: Re: [PATCH 3/6] fetch-pack: in protocol v2, enqueue commons first
Jonathan Tan wrote:
> In do_fetch_pack_v2(), rev_list_insert_ref_oid() is invoked before
> everything_local(). This means that if we have a commit that is both our
> ref and their ref, it would be enqueued by rev_list_insert_ref_oid() as
> SEEN, and since it is thus already SEEN, everything_local() would not
> enqueue it.
> If everything_local() were invoked first, as it is in do_fetch_pack()
> for protocol v0, then everything_local() would enqueue it with
> COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents
> are not sent as "have" lines.
> Change the order in do_fetch_pack_v2() to be consistent with
> do_fetch_pack(), and to avoid sending unnecessary "have" lines.
I get lost in the above description. I suspect it's doing a good job
of describing the code, instead of answering the question I really
have about what is broken and what behavior we want instead.
E.g. are there some commands that I can run to trigger the unnecessary
"have" lines? That would make it easier for me to understand the rest
and whether this is a good approach for suppressing them.
It's possible I should be skipping to the test, but a summary in the
commit message would make life easier for lazy people like me. :)
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1372,14 +1372,14 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> for_each_ref(clear_marks, NULL);
> marked = 1;
> - for_each_ref(rev_list_insert_ref_oid, NULL);
> - for_each_cached_alternate(insert_one_alternate_object);
> /* Filter 'ref' by 'sought' and those that aren't local */
> if (everything_local(args, &ref, sought, nr_sought))
> state = FETCH_DONE;
> state = FETCH_SEND_REQUEST;
> + for_each_ref(rev_list_insert_ref_oid, NULL);
> + for_each_cached_alternate(insert_one_alternate_object);
This is subtle. My instinct would be to assume that the purpose of
everything_local is just to determine whether we need to send a fetch
request, but it appears we also want to rely on a side effect from it.
Could everything_local get a function comment to describe what side
effects we will be counting on from it?
> case FETCH_SEND_REQUEST:
> if (send_fetch_request(fd, args, ref, &common,
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 0680dec80..ad6a50ad6 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -808,6 +808,41 @@ test_expect_success 'fetch with --filter=blob:limit=0' '
> fetch_filter_blob_limit_zero server server
> +test_expect_success 'use ref advertisement to prune "have" lines sent' '
nit: this adds the new test as last in the script. Is there some
logical earlier place in the file it can go instead? That way, the
file stays organized and concurrent patches that modify the same test
script are less likely to conflict.
> + rm -rf server client &&
> + git init server &&
> + test_commit -C server aref_both_1 &&
> + git -C server tag -d aref_both_1 &&
> + test_commit -C server aref_both_2 &&
What does aref stand for?
> + # The ref name that only the server has must be a prefix of all the
> + # others, to ensure that the client has the same information regardless
> + # of whether protocol v0 (which does not have ref prefix filtering) or
> + # protocol v2 (which does) is used.
must or else what? Maybe:
# In this test, the ref name that only the server has is a prefix of
# all other refs. This ensures that the client has the same information
# regardless of [etc]
> + git clone server client &&
> + test_commit -C server aref &&
> + test_commit -C client aref_client &&
> + # In both protocol v0 and v2, ensure that the parent of aref_both_2 is
> + # not sent as a "have" line.
Why shouldn't it be sent as a "have" line? E.g. does another "have"
line make it redundant?
> + rm -f trace &&
> + cp -r client clientv0 &&
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv0 \
> + fetch origin aref &&
> + grep "have $(git -C client rev-parse aref_client)" trace &&
> + grep "have $(git -C client rev-parse aref_both_2)" trace &&
nit: can make this more robust by doing
aref_client=$(git -C client rev-parse aref_client) &&
aref_both_2=$(git -C client rev-parse aref_both_2) &&
since this way if the git command fails, the test fails.
> + ! grep "have $(git -C client rev-parse aref_both_2^)" trace &&
Thanks for a pleasant read,