Web lists-archives.com

Re: [WIP 2/2] submodule: read-only super-backed ref backend




On Fri, Dec 1, 2017 at 2:50 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> Note that a few major parts are still missing:
>  - special handling of the current branch of the superproject
>  - writing (whether "refs/..." to the superproject as an index change or
>    a commit, or non-"refs/..." directly to the subproject like usual)
>
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>


> --- a/refs.c
> +++ b/refs.c
> @@ -1575,12 +1575,17 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
>  static struct ref_store *ref_store_init(const char *gitdir,
>                                         unsigned int flags)
>  {
> -       const char *be_name = "files";
> -       struct ref_storage_be *be = find_ref_storage_backend(be_name);
> +       struct ref_storage_be *be;
>         struct ref_store *refs;
>
> +       if (getenv("USE_SP")) {
> +               be = &refs_be_sp;
> +       } else {
> +               be = &refs_be_files;
> +       }
> +
>         if (!be)
> -               die("BUG: reference backend %s is unknown", be_name);
> +               die("BUG: reference backend %s is unknown", "files");

If we pursue this approach more than just RFC-ish we would probably not use
an environment variable but have some detection logic for the
different ref backends.

For example Shawns refstable[1] proposal uses the configuration
`extensions.refStorage == reftable`, which this proposal could reuse and
set to 'indexed_superproject' or 'commit_in_superproject', depending on the
write semantics that we'd need to disucss (or do we want to offer 2 different
superproject backends having these different semantics?)

More to the code here, we could still use `find_ref_storage_backend()`
depending on the env:

    const char *be_name = "files";
    ...
+    if (getenv("USE_SP"))
+        be_name = "sp-backend";
    ...

and then we'd have to change the 'next ' pointer in refs_be_files
(in refs/files-backend.c line ~3100) to point at the superproject backend
instead of NULL.

[1] https://googlers.googlesource.com/sop/jgit/+/reftable/Documentation/technical/reftable.md#Repository-format

> +static int sp_read_raw_ref(struct ref_store *ref_store,
> +                             const char *refname, struct object_id *oid,
> +                             struct strbuf *referent, unsigned int *type)
> +{
> +       struct sp_ref_store *refs;
> +
> +       refs = sp_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
> +
> +       if (!starts_with(refname, "refs/")) {
> +               return refs_read_raw_ref(refs->files, refname, oid, referent, type);
> +       }
> +
> +       /* read from the superproject instead */
> +       return get_superproject_gitlink_oid(refname, oid);
> +}

This function would need to get more sophisticated logic I would assume;
the current RFC is a read-only thing, such that it may be sufficient
to differentiate
between refs/* and specials such as [FETCH_]HEAD, but as seen in the
sp_ref_store struct definition we're already keeping a local reflog. I would
assume that remotes are also local to the submodule instead of reusing the
superprojects remote, for the sake of pulling in upstream changes of the
submodule instead of being fully integrated.



> diff --git a/submodule.c b/submodule.c
> index ce511180e..1ffaeec82 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -471,6 +471,7 @@ static void prepare_submodule_repo_env_no_git_dir(struct argv_array *out)
>  void prepare_submodule_repo_env(struct argv_array *out)
>  {
>         prepare_submodule_repo_env_no_git_dir(out);
> +       argv_array_pushf(out, "USE_SP");
>         argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
>                          DEFAULT_GIT_DIR_ENVIRONMENT);
>  }

This is an easy way to enforce all submodules use the superproject refs mode,
which makes it easy to make recursive commands (e.g. `git checkout --recursive`
would not need to pass down the exact gitlink sha1, but could pass
down the branch
name or even commit-ish "HEAD^^" and the submodule stays in perfect
sync (according
to superproject commit-ish). Once we take this further I would imagine
there is an option
in the submodule that specifies what mode it is using. Not sure if
we'd want to keep the
environment variable to force that mode from the superproject, though.

> @@ -1986,7 +1987,7 @@ void absorb_git_dir_into_superproject(const char *prefix,
>   * Return 0 if successful, 1 if not (for example, if the parent
>   * directory is not a repo or an unrelated one).
>   */
> -int get_superproject_entry(const char **full_name)
> +static int get_superproject_entry(const char *ref, const char **full_name, struct object_id *oid)

Making it static could be part of the prior patch?

When the ref is given we use `ls-tree -r` instead of `ls-files --stage`
which give similar outputs to have the parsing logic overlap below,
the difference
is whether the index is looked up or the given ref.

> @@ -2016,9 +2017,11 @@ int get_superproject_entry(const char **full_name)
>         prepare_submodule_repo_env(&cp.env_array);
>         argv_array_pop(&cp.env_array);
>
> -       argv_array_pushl(&cp.args, "--literal-pathspecs", "-C", "..",
> -                       "ls-files", "-z", "--stage", "--full-name", "--",
> -                       subpath, NULL);
> +       argv_array_pushl(&cp.args, "--literal-pathspecs", "-C", "..", NULL);
> +       if (ref)
> +               argv_array_pushl(&cp.args, "ls-tree", "-z", "--full-name", "-r", ref, subpath, NULL);

-- between ref and path?

> @@ -2063,7 +2077,7 @@ const char *get_superproject_working_tree(void)
>         size_t len;
>         const char *ret;
>
> -       if (get_superproject_entry(&full_name))
> +       if (get_superproject_entry(NULL, &full_name, NULL))
>                 return NULL;
>
>         super_wt = xstrdup(xgetcwd());
> @@ -2075,6 +2089,13 @@ const char *get_superproject_working_tree(void)
>         return ret;
>  }
>
> +int get_superproject_gitlink_oid(const char *ref, struct object_id *oid)
> +{
> +       if (get_superproject_entry(ref, NULL, oid))

I see; the get_superproject_entry function is more powerful, than what
I proposed
in the status series. It takes vastly different arguments, for
inspection, so the caller
is responsible instead of having dumber callers and two different functions

>
> +test_expect_success 'ref backend based on superproject data' '
> +       rm -rf sub super &&
> +
> +       git init sub &&
> +       test_commit -C sub first &&
> +       test_commit -C sub second &&
> +
> +       git init super &&
> +
> +       # master branch in superproject, submodule at second
> +       git -C super submodule add ../sub sub &&
> +       git -C super commit -m x &&
> +
> +       # anotherbranch in superproject, submodule at first
> +       git -C super checkout -b anotherbranch &&
> +       git -C super/sub checkout first &&
> +       git -C super commit -a -m x &&
> +
> +       # Notice that rev-parse can parse "anotherbranch" and see that it
> +       # points to first, even though a branch with the name "anotherbranch"
> +       # is only defined in the superproject
> +       git -C sub rev-parse first >expect &&
> +       USE_SP=1 git -C super/sub rev-parse anotherbranch >actual &&
> +       test_cmp expect actual
> +'

Thanks for writing this proof of concept.
As Jonathan Nieder mentioned another similar direction of this idea could be
used to decorate the branches in the submodule to indicate where the
superproject
points to.

I think we'd need to hash out the details of the write operations,
before going further here,
but this is the best long term solution so far IMHO. (Short term we
might carry local
patches that make submodules more repo like).

Thanks,
Stefan