Web lists-archives.com

More about blob reachability (while fetching arbitrary blobs)




There has been some talk about fetching blobs from repos with missing
objects [1] [2]. I took a further look at the issue of blob
reachability.

I have written earlier that there is a bug in rev-list when used to
compute reachability [3], but someone pointed me to bitmaps, and I found
out that "--use-bitmap-index" makes rev-list use an alternative code
path in the presence of a bitmap (whether fully covering or partially
covering the relevant objects), and that code doesn't contain that bug.
If reachability checks are desired, bitmaps are probably required for
performance anyway [4].

We should implement things so that users will be strongly recommended to
do one of the following:
 1) configure explicitly (instead of it being the default) the
    equivalent of allowanysha1inwant to disable reachability checks, or
 2) use a bitmap (and it is recommended to periodically repack).

(This strong recommendation is currently a requirement due to the fact
that the 3rd option, "do nothing", is not possible due to the bug in
rev-list described above.)

I have included a small patch (below) to do that for upload-pack. (For
the case of upload-pack, which until now does not really need to process
blob wants, a possible alternative is to forbid trees and blobs from
being processed by upload-pack at all.)

[1] <20170304191901.9622-1-markbt@xxxxxxxxxx>
[2] <1488999039-37631-1-git-send-email-git@xxxxxxxxxxxxxxxxx>
[3] <cover.1487984670.git.jonathantanmy@xxxxxxxxxx>
[4] I checked, and got 0m0.281s for "time git rev-list --objects
    --use-bitmap-index HEAD^{tree} --not --all" on a fully repacked
    git.git. (Not sure whether that's OK performance.)

-- 8< --
Subject: [PATCH] upload-pack: forbid fetching unreachable blobs

If allowreachablesha1inwant is set, upload-pack will provide a blob to a
user, provided its hash, regardless of whether the blob is reachable or
not.

Partially fix this by (i) including the "--objects" argument, if
necessary, when invoking "rev-list", and (ii) including the
"--use-bitmap-index" argument.

(i) is so that rev-list operates on the correct granularity, depending
on the type of objects checked.

(ii) needs more explanation:

  Our invocation of "rev-list" is supposed to return `A - B`, where `A`
  is the set of all objects reachable from the objects that we want to
  check the reachability of, and `B` is the set of all objects reachable
  from any of our refs. If all our objects are reachable, then the
  resulting set is empty.  However, the non-bitmap-using part of the
  code wrongly excludes some trees and blobs from `B`, making the
  resulting set sometimes non-empty and thus wrongly concluding that an
  object is unreachable when it is, in fact, reachable.  (However, the
  error does not run the other way - an object will not be indicated as
  reachable when it is unreachable.)

  The "--use-bitmap-index" argument partially solves this problem by
  activating another code path when the serving repository has a bitmap.
  This other code path does not contain the bug described above.

("--use-bitmap-index" has other benefits besides correctness, but has
not yet been included likely because bitmaps were introduced in commit
fff4275 ("pack-bitmap: add support for bitmap indexes", 2013-12-21),
later than the last time this part of the code was touched in commit
051e400 ("helping smart-http/stateless-rpc fetch race", 2011-08-05)).

Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
---
 t/t5500-fetch-pack.sh | 31 +++++++++++++++++++++++++++++++
 upload-pack.c         | 17 ++++++++++++++++-
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 505e1b4a7..1568aed82 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,37 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
 	git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'setup for tests that fetch blobs by hash' '
+	git init blobserver &&
+	test_commit -C blobserver 1 &&
+	git -C blobserver repack -a -d --write-bitmap-index &&
+	test_commit -C blobserver 2 &&
+	test_commit -C blobserver 3 &&
+	blob1=$(echo 1 | git hash-object --stdin) &&
+	blob2=$(echo 2 | git hash-object --stdin) &&
+	blob3=$(echo 3 | git hash-object --stdin) &&
+
+	unreachable=$(echo 4 | git -C blobserver hash-object -w --stdin) &&
+	git -C blobserver cat-file -e "$unreachable"
+'
+
+test_expect_success 'fetch-pack can fetch reachable blobs by hash' '
+	test_config -C blobserver uploadpack.allowreachablesha1inwant 1 &&
+
+	git init reachabletest &&
+	git -C reachabletest fetch-pack ../blobserver "$blob1" "$blob2" &&
+	git -C reachabletest cat-file -e "$blob1" &&
+	git -C reachabletest cat-file -e "$blob2" &&
+	test_must_fail git -C reachabletest cat-file -e "$blob3"
+'
+
+test_expect_success 'fetch-pack cannot fetch unreachable blobs' '
+	test_config -C blobserver uploadpack.allowreachablesha1inwant 1 &&
+
+	git init unreachabletest &&
+	test_must_fail git -C unreachabletest fetch-pack ../blobserver "$blob1" "$unreachable"
+'
+
 check_prot_path () {
 	cat >expected <<-EOF &&
 	Diag: url=$1
diff --git a/upload-pack.c b/upload-pack.c
index 7597ba340..a00f0034c 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -469,7 +469,10 @@ static int do_reachable_revlist(struct child_process *cmd,
 				struct object_array *reachable)
 {
 	static const char *argv[] = {
-		"rev-list", "--stdin", NULL,
+		"rev-list", "--use-bitmap-index", "--stdin", NULL,
+	};
+	static const char *argv_with_objects[] = {
+		"rev-list", "--use-bitmap-index", "--objects", "--stdin", NULL,
 	};
 	struct object *o;
 	char namebuf[42]; /* ^ + SHA-1 + LF */
@@ -488,6 +491,18 @@ static int do_reachable_revlist(struct child_process *cmd,
 	 */
 	sigchain_push(SIGPIPE, SIG_IGN);
 
+	/*
+	 * If we are testing reachability of a tree or blob, rev-list needs to
+	 * operate more granularly.
+	 */
+	for (i = 0; i < src->nr; i++) {
+		o = src->objects[i].item;
+		if (o->type == OBJ_TREE || o->type == OBJ_BLOB) {
+			cmd->argv = argv_with_objects;
+			break;
+		}
+	}
+
 	if (start_command(cmd))
 		goto error;
 
-- 
2.12.0.246.ga2ecc84866-goog