Web lists-archives.com

Re: [PATCH 1/1] archive: learn to include submodules in output archive




Nikhil Benesch <nikhil.benesch@xxxxxxxxx> writes:

> This commit is a revival of Lars Hjemli's 2009 patch to provide an
> option to include submodules in the output of `git archive`.
>
> The `--recurse-submodules` option (named consistently with fetch, clone,
> and ls-files) will recursively traverse submodules in the repository and
> consider their contents for inclusion in the output archive, subject to
> any pathspec filters. Like other commands that have learned
> `--recurse-submodules`, submodules that have not been checked out will
> not be traversed.
>
> Signed-off-by: Nikhil Benesch <nikhil.benesch@xxxxxxxxx>
> ---

I'll let others comment on the proposed log message.

> diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
> index cfa1e4ebe..f223f9e05 100644
> --- a/Documentation/git-archive.txt
> +++ b/Documentation/git-archive.txt
> @@ -11,8 +11,8 @@ SYNOPSIS
>  [verse]
>  'git archive' [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
>  	      [-o <file> | --output=<file>] [--worktree-attributes]
> -	      [--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
> -	      [<path>...]
> +	      [--recurse-submodules] [--remote=<repo> [--exec=<git-upload-archive>]]
> +	      <tree-ish> [<path>...]

With the reflowing of lines it was not immediately obvious, but the
only change is to addd [--recurse-submodules] in front of the
existing [--remote=<repo>] option.  It would have probably made more
sense to add new things at the end, just because these options can
be given in any order.

> +--recurse-submodules::
> +	Recursively include the contents of any checked-out submodules in
> +	the archive.

OK.  

A question to the submodule folks: Is "checked-out" the right terminology
in this context?  I am wondering if we have reached more official set
of words to express submodule states discussed in [*1*].

> -	if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
> -		if (args->verbose)
> -			fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -		err = write_entry(args, sha1, path.buf, path.len, mode);
> -		if (err)
> -			return err;
> -		return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> -	}
> -
>  	if (args->verbose)
>  		fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -	return write_entry(args, sha1, path.buf, path.len, mode);
> +	err = write_entry(args, sha1, path.buf, path.len, mode);
> +	if (err)
> +		return err;
> +	if (S_ISDIR(mode) || (S_ISGITLINK(mode) && args->recurse_submodules &&
> +			      !add_submodule_odb(path_without_prefix)))
> +		return READ_TREE_RECURSIVE;
> +	return 0;

So, we used to (and without the option we still) stop recursing when
we see a gitlink after asking write_entry() to write the path itself
out, but with the option, we keep going.

How does pathspec limiting work with this codepath?  If you have a
superproject with a regular file "sub/README" that belongs to the
superproject itself, and a submodule at "sub/module/", what makes
sure that this command

    $ git archive --recurse-submodules HEAD -- 'sub/R*' 'sub/module/lib/'

looks only in 'lib/' part of the submodule?

> diff --git a/t/t5005-archive-submodules.sh b/t/t5005-archive-submodules.sh
> new file mode 100755
> index 000000000..747e38627
> --- /dev/null
> +++ b/t/t5005-archive-submodules.sh
> @@ -0,0 +1,112 @@
> +#!/bin/sh
> +
> +test_description='git archive can include submodule content'
> +
> +. ./test-lib.sh
> +
> +add_file()
> +{

Style.  A shell function begins like so (see Documentation/CodingGuidelines):

	add_file () {

> +	git add $1 &&

Put dq around "$1" to make it easier to reuse, even if you know you
only happen to use pathnames without any $IFS whitespace characters.

I think all of these comments apply to the other shell function you
added to this file..

> +test_expect_success 'by default, submodules are not included' '
> +	echo "File 1" >1 &&
> +	add_file 1 &&
> +	add_submodule 2 3 &&
> +	add_submodule 4 5 &&
> +	cat <<EOF >expected &&
> +1
> +2/
> +4/
> +EOF

Use <<-EOF to allow you to indent your here-doc data.  Also when
your here-doc data does not need any $variable_interpolation, quote
the end marker to help reduce reviewer's mental burden, i.e.

	cat <<-\EOF >expected &&
	1
	2/
	4/
	EOF

The same comment applies to other tests (I won't even repeat this).

> +	git archive HEAD >normal.tar &&
> +	tar -tf normal.tar >actual &&

Lose "-" from "-tf"; that's the canonical and the most portable way
to spell the option to "tar".

> +test_expect_success 'packed submodules are supported' '
> +	msg=$(cd 2 && git repack -ad && git count-objects) &&
> +	test "$msg" = "0 objects, 0 kilobytes" &&
> +	git archive --recurse-submodules HEAD >packed.tar &&
> +	tar -tf packed.tar >actual &&
> +	test_cmp expected actual
> +'

I am not sure how packing would be expected to break anything, but
an extra test may not hurt.

I notice that you only test the case where the submodule's ".git" 
is embedded in its working tree (as opposed to ".git" being a
"gitdir:" file that points at the real location of the repository,
somewhere inside ".git/modules/" of the superproject).  You probably
want to make sure that also works, too.

I also notice that you do not use any pathspec in your "git archive"
invocation.  You'd want to make sure that works as expected.

Thanks.


[Reference]

*1* <20170209020855.23486-1-sbeller@xxxxxxxxxx>