Web lists-archives.com

Re: [PATCH 1/1] archive: learn to include submodules in output archive

Welcome to the Git community!

On Sat, Mar 11, 2017 at 11:54 PM, Nikhil Benesch
<nikhil.benesch@xxxxxxxxx> wrote:
> This commit is a revival of Lars Hjemli's 2009 patch to provide an
> option to include submodules in the output of `git archive`.

I am unaware of said patch, could you link to it, e.g.
on https://public-inbox.org/git/ ?

> The `--recurse-submodules` option (named consistently with fetch, clone,
> and ls-files)

... and unlike fetch and push we do not need to introduce extra options?
(or they can be added later when the need arises)

> will recursively traverse submodules in the repository and
> consider their contents for inclusion in the output archive, subject to
> any pathspec filters.


> Like other commands that have learned
> `--recurse-submodules`, submodules that have not been checked out will
> not be traversed.

Junio writes:
> A question to the submodule folks: Is "checked-out" the right terminology
> in this context?  I am wondering if we have reached more official set
> of words to express submodule states discussed in [*1*].


The "checked-out" here would translate to "populated" in said proposal.

I guess that is sufficient for the first implementation, but eventually we
might be interested in "recurse-submodules=active/init" as well.
Consider the case, where you delete a submodule and commit it.
Then you still want to be able to create an archive for HEAD^ (with
submodules). So in that case we'd want to recurse into any submodule
that has a git directory with the commit as recorded by the superproject,
i.e. the example given would refer to "depopulated" in the referenced

When the initialization is missing, we may not be interested in that
submodule any more, but we don't know for the user, so a user may
want to ask for "archive --recurse-submodules={all / initialized-only /
all-submoduless-with-git-dir, none, working-tree}". No need to add all
these options in the first patch, but keep that in mind for future

> @@ -59,6 +59,10 @@ OPTIONS
>         Look for attributes in .gitattributes files in the working tree
>         as well (see <<ATTRIBUTES>>).
> +--recurse-submodules::
> +       Recursively include the contents of any checked-out submodules in
> +       the archive.
> +

Here is "any", in the commit message we had "subject to any pathspec filters."

> @@ -132,18 +133,15 @@ static int write_archive_entry(const unsigned char *sha1, const char *base,
>                 args->convert = ATTR_TRUE(check->items[1].value);
>         }
> -       if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
> -               if (args->verbose)
> -                       fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -               err = write_entry(args, sha1, path.buf, path.len, mode);
> -               if (err)
> -                       return err;
> -               return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
> -       }
> -
>         if (args->verbose)
>                 fprintf(stderr, "%.*s\n", (int)path.len, path.buf);
> -       return write_entry(args, sha1, path.buf, path.len, mode);
> +       err = write_entry(args, sha1, path.buf, path.len, mode);
> +       if (err)
> +               return err;
> +       if (S_ISDIR(mode) || (S_ISGITLINK(mode) && args->recurse_submodules &&
> +                             !add_submodule_odb(path_without_prefix)))

This is a bit hard to read. Maybe reformatting can help

     if (S_ISDIR(mode) ||
        (S_ISGITLINK(mode) &&
         args->recurse_submodules &&
        return ...

Though I wonder if we need to special case the
return value of add_submodule_odb as that could
be an error instead of fall through "return 0;" ?

> @@ -419,6 +418,8 @@ static int parse_archive_args(int argc, const char **argv,
>                         N_("prepend prefix to each pathname in the archive")),
>                 OPT_STRING('o', "output", &output, N_("file"),
>                         N_("write the archive to this file")),
> +               OPT_BOOL(0, "recurse-submodules", &recurse_submodules,
> +                       N_("recurse through submodules")),

This is the first time hearing "through" used with submodules.
I guess it makes sense when looking at the contents on disk as
a tree data structure.

push:    N_("control recursive pushing of submodules"),
fetch:    N_("control recursive fetching of submodules"),
grep:    N_("recursivley search in each submodule")),
ls-files:    N_("recurse through submodules")),

Ok, we introduced recursing "through" submodules with ls-files.

> +
> +add_submodule()
> +{
> +       mkdir $1 && (
> +               cd $1 &&
> +               git init &&
> +               echo "File $2" >$2 &&
> +               add_file $2

test_create_repo / test_commit
might come in handy here as well.

> +       ) &&
> +       add_file $1
> +}
> +
> +test_expect_success 'by default, submodules are not included' '

This test case may be covered elsewhere (in the default archive tests)
already, though not spelled out as such?

> +test_expect_success 'with --recurse-submodules, checked out submodules are included' '
> +       cat <<EOF >expected &&

In the modern parts of Git we indent the content of the here-doc-text
(If the operator is “<<-” instead of “<<”, then leading tabs in the
here-doc-text are stripped.)
i.e. cat <<-\EOF >expect
    well indented text
    that doesn't break visual alignment
    of tests.
    git archive ...

There are a lot of old tests still out there, so it is easy to
find them for guidance first. :/

> +test_expect_success 'submodules in submodules are supported' '

cool! Usually these are referred to as "nested submodules" in
case you want to shorten the description.

> +test_expect_success 'packed submodules are supported' '

Not sure what we try to test here? The internal representation
of the submodule ought to not matter.

> +       msg=$(cd 2 && git repack -ad && git count-objects) &&
> +       test "$msg" = "0 objects, 0 kilobytes" &&

I read that in submodule tests before; t7408 that tests if
submodules with alternates are useful. I am not convinced we
need this test here?

> +test_expect_success 'pathspecs supported' '
> +       git archive --recurse-submodules HEAD >recursive.tar &&

The test ought to test that Git can use pathspecs ...

> +       tar -tf recursive.tar 4/6/7 2/3 >actual &&

... unlike tar. We consider all external tools to be perfect within
the git test suite. So we rather want to test for the absence of
paths, that were not given as a pathspec to the git archive command.

> +test_expect_success 'missing submodule packs triggers an error' '


> +       mv 2/.git/objects/pack .git/packdir2 &&

Uh :/ please do not assume we have the submodule at
<path>/.git. See 293ab15ee (submodule: teach rm to remove
submodules unless they contain a git directory, 2012) for example.

> +       test_must_fail git archive --recurse-submodules HEAD

Do we want to also check for a sensible error message here?
    test_must_fail git archive ... >out &&
    test_i18n_grep "<message>" out

> +test_expect_success '--recurse-submodules skips non-checked out submodules' '

I read that as "skips unpopulated" ...