Web lists-archives.com

Re: [PATCH] rev-list: allow cached objects in existence check




Jeff King <peff@xxxxxxxx> writes:

> This fixes a regression in 7c0fe330d5 (rev-list: handle missing tree
> objects properly, 2018-10-05) where rev-list will now complain about the
> empty tree when it doesn't physically exist on disk.
>
> Before that commit, we relied on the traversal code in list-objects.c to
> walk through the trees. Since it uses parse_tree(), we'd do a normal
> object lookup that includes looking in the set of "cached" objects
> (which is where our magic internal empty-tree kicks in).
>
> After that commit, we instead tell list-objects.c not to die on any
> missing trees, and we check them ourselves using has_object_file(). But
> that function uses OBJECT_INFO_SKIP_CACHED, which means we won't use our
> internal empty tree.

Yikes.  Thanks for spotting.

> This patch makes the minimal fix, which is to swap out a direct call to
> oid_object_info_extended(), minus the SKIP_CACHED flag, instead of
> calling has_object_file(). This is all that has_object_file() is doing
> under the hood. And there's little danger of unrelated fallout from
> other unexpected "cached" objects, since there's only one call site that
> ends such a cached object, and it's in git-blame.

OK.  That last one is not even "cached" but "merely exists in-core"
virtual commit, that represents the locally modified state, right?
I do not think rev-list ever asks if these object exist, but if they
were asked, we should say they also exist.

> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> I prepared this directly on top of 7c0fe330d5, but it should merge
> cleanly into the current tip of master.
>
> I think we might also consider just having has_object_file() respect
> cached objects. The SKIP_CACHED flag comes from Jonathan Tan's
> e83e71c5e1 (sha1_file: refactor has_sha1_file_with_flags, 2017-06-21).
> But it was just matching the old behavior; it's not clear to me that we
> particularly care about that, and it wasn't simply that nobody bothered
> to put the cached-object check into has_sha1_file().

Yup, I am fine with such a clean-up after we fix this regression.

> Some concerns/arguments against it:
>
>   - we probably would want to make sure we do not short-cut
>     write_sha1_file(). I.e., we should still write it to disk when
>     somebody wants it. But I think that works, because that function
>     uses its own check-and-freshen infrastructure.
>
>   - some callers of has_sha1_file() might care about durability between
>     processes. Because it's baked in, the empty tree is safe for that
>     (whatever follow-on process runs, it will also be baked in there).
>     But that's not necessarily true for other "cached" objects. I'm not
>     really that worried about it because we use it sparingly (the only
>     call to pretend_sha1_file() is in git-blame, and if it ever did ask
>     "do we have this object", I actually think the right answer would be
>     "yes").

... and I realize that I should have read ahead before writing the
four lines above myself ;-)  We are on the same page.

>     But if this is a concern, we could perhaps have two levels of flags:
>     SKIP_CACHED and SKIP_INTERNAL.
>
>  builtin/rev-list.c           |  2 +-
>  t/t1060-object-corruption.sh | 10 ++++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index 49d6deed70..877b6561f4 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -237,7 +237,7 @@ static inline void finish_object__ma(struct object *obj)
>  static int finish_object(struct object *obj, const char *name, void *cb_data)
>  {
>  	struct rev_list_info *info = cb_data;
> -	if (!has_object_file(&obj->oid)) {
> +	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
>  		finish_object__ma(obj);
>  		return 1;
>  	}
> diff --git a/t/t1060-object-corruption.sh b/t/t1060-object-corruption.sh
> index ac1f189fd2..807b63b473 100755
> --- a/t/t1060-object-corruption.sh
> +++ b/t/t1060-object-corruption.sh
> @@ -125,4 +125,14 @@ test_expect_success 'fetch into corrupted repo with index-pack' '
>  	)
>  '
>  
> +test_expect_success 'internal tree objects are not "missing"' '
> +	git init missing-empty &&
> +	(
> +		cd missing-empty &&
> +		empty_tree=$(git hash-object -t tree /dev/null) &&
> +		commit=$(echo foo | git commit-tree $empty_tree) &&
> +		git rev-list --objects $commit
> +	)
> +'
> +
>  test_done