Web lists-archives.com

Un-submodule a repository with submodules




Hi all,

my starting point is a Mercurial repository containing several
sub-repositories.  The parent repo itself has very little content itself
but mostly acts as "orchestrator" of the sub-repos, i.e., it contains
build files for building the project as a whole, and it has branches
which are reflected in the sub-repos (i.e., branch r1.0 in the parent
tracks the branch of the same name in all sub-repos).

The reason that this repository has sub-repositories is basically caused
by a injudicious decision a long time ago.  All the sub-repositories are
internal projects developed by the same team which have and will never
be used anywhere else but the current parent project.

My goal is to convert it to Git, do an on-the-fly UTF-8/UNIX-EOL
conversion for all source code files, and get rid of the
sub-repositories.

What I have so far is the Git/UTF-8/EOL conversion using the excellent
fast-export tool [1].  I.e., now I have Git repositories for all the
sub-repos and a parent Git project including all those as Git
submodules.

So what's left is the "un-submoduling" part.  Of course, I've searched
the net for solutions.  The best one I've found is this script [2]
which is based on this blog posting [3].

If I understand that correctly, what it does is essentially:

0. Remove the submodule from the parent project.
1. Use "git filter-branch" to rewrite the submodule's history so that
   it looks like its commits don't modify the files in . but in the
   submodule directory of the parent project.
2. Fetch the rewritten history into the parent project.
3. Do a merge, clone, add, commit combo which I don't quite understand.

   $ git merge -s ours --no-commit --allow-unrelated-histories \
               "${sub}/${branch}"
   # Add submodule content
   $ git clone -b "${branch}" "${url}" "${path}"
   $ git add "${path}"
   $ git commit -m "Merge submodule contents for ${sub}/${branch}"
   
   (Couldn't it just git merge --allow-unrelated-histories
   "${sub}/${branch}"?)
   
Anyway, the result of the procedure is that *after* the commit created
in step 3, the submodule is properly integrated including all its
history.
   
However, what doesn't satisfy me with that solution is that if I
checkout a commit before the "Merge submodule" commit made above (or
some tag or another branch), it'll still have the submodules.

So is there some way to integrate all submodules into the parent project
in such a way that it appears as if they have always been just commits
touching files inside some directory in the parent project?

Well, I guess my wish is not too uncommon and that there seems to be no
ready-made solution might be a good indicator of its infeasibility.  If
so, what would you suggest to mitigate the transition pain?

I'm thinking of a fallback plan like this:

  - Un-submodule just the master branch using the script in [2].
  - On each dev computer, have a git worktree for master and one for
    everything older because I assume it's tedious to frequently switch
    between submodule/non-submodule commits.
  - Bugs are usually fixed on the oldest active applicable branch and
    then merged or cherry-picked upwards.  How do I get them into master
    assuming the commit is in a sub-repository on another branch?  Is
    there something easier than plain diff/patch?

Thanks a lot for any help and suggestions,
Tassilo

* Footnotes
[1] https://github.com/frej/fast-export/
[2] https://github.com/jeremysears/scripts/blob/master/bin/git-submodule-rewrite
[3] https://x3ro.de/2013/09/01/Integrating-a-submodule-into-the-parent-repository.html