Web lists-archives.com

[PATCH] Rough draft: Split a git repo into superproject and submodule




I have some extremely large git repositories which I want to split into
modules.  This could be done using 'git filter-branch' and the following
steps:

 1) Create a submodule using --subdirectory-filter.
 2) Create a superproject using an index filter to delete the submodule.
 3) Commit the submodule to the latest version of the superproject.

Unfortunately, this approach loses all the historical connections
between the superproject and the submodule, breaking tools like 'git
bisect', and making it difficult to recover old releases.

Ideally, each version of the newly created superproject would be linked
to the correct version of the submodule (and all the .gitmodules entries
would be set up correctly, too, throughout the project's history).

The attached patch contains a _very_ rough draft of a git
submodule-split command.  You can run it as follows:

  git submodule-split original-repo superproject directory-to-split
  git clone superproject.git superproject
  cd superproject
  git submodule update --init

It will output two repositories named superproject.git and
subproject.git.  Internally, it runs 'git filter-branch' on a bare
repository.

There's a test suite, too.

I still have quite a few things left to do before this script is
actually useful:

  Rename submodule-split to something more appropriate
  Update .gitmodules using git config
  Merge new entries with existing .gitmodules
  Add support for directory and repository names which differ
  Add support for multiple possible directory names
  Add support for directories which move around the tree
  Add support for directories which are missing in some revisions

I'm releasing this version for feedback on the general design and the
coding style.  I don't often write shell scripts of this magnitude, and
there are almost certainly some portability and style problems.  So please
let me know what needs improvement, and I'll try to fix it.

Thank you for your feedback!
---
 .gitignore                 |    1 +
 Makefile                   |    1 +
 git-submodule-split.sh     |  113 ++++++++++++++++++++++++++++++++++++++++++++
 t/t7404-submodule-split.sh |   36 ++++++++++++++
 4 files changed, 151 insertions(+), 0 deletions(-)
 create mode 100644 git-submodule-split.sh
 create mode 100755 t/t7404-submodule-split.sh

diff --git a/.gitignore b/.gitignore
index 13311f1..fa6ed07 100644
--- a/.gitignore
+++ b/.gitignore
@@ -119,6 +119,7 @@ git-show
 git-show-branch
 git-show-index
 git-show-ref
+git-submodule-split
 git-stage
 git-stash
 git-status
diff --git a/Makefile b/Makefile
index 27b9569..aceac8f 100644
--- a/Makefile
+++ b/Makefile
@@ -277,6 +277,7 @@ SCRIPT_SH += git-sh-setup.sh
 SCRIPT_SH += git-stash.sh
 SCRIPT_SH += git-submodule.sh
 SCRIPT_SH += git-web--browse.sh
+SCRIPT_SH += git-submodule-split.sh
 
 SCRIPT_PERL += git-add--interactive.perl
 SCRIPT_PERL += git-archimport.perl
diff --git a/git-submodule-split.sh b/git-submodule-split.sh
new file mode 100644
index 0000000..d7d1080
--- /dev/null
+++ b/git-submodule-split.sh
@@ -0,0 +1,113 @@
+#!/bin/sh
+# 
+# Split a repository into a submodule and main module, with history
+#
+# Copyright 2009 Eric Kidd
+# License: GNU General Public License, version 2 or later
+
+USAGE="src-repo dst-repo submodule-repo"
+
+OPTIONS_SPEC=
+NONGIT_OK=Yes
+. git-sh-setup
+
+# Keep our argument parsing simple for now.
+test "$#" = 3 || usage
+src_repo="$1"
+dst_repo="$2.git"
+sub_repo="$3.git"
+submodule_dir="$3"
+revs="--all"
+
+# Make a bare clone of a git repo with identical branches.
+git_mirror() {
+	git clone --mirror "$1" "$2" || exit 1
+	# For some reason, git clone --mirror doesn't actually create our
+	# local branch references for us.
+	(cd "$2" && git fetch || exit 1)
+}
+
+# We export these variables so that they can be used from scripts passed to
+# git filter-branch.  Thanks to gitte for this trick, which also allows us
+# to do the right thing when subdirectory names contain spaces and quotes.
+export SPLIT_SUB_REPO="$sub_repo"
+export SPLIT_SUBMODULE_DIR="$submodule_dir"
+export SPLIT_MAP_DIR="`pwd`/$sub_repo/split-map"
+
+
+#--------------------------------------------------------------------------
+# Create the new submodule
+
+# Create a copy of $src_repo to transform.
+git_mirror "$src_repo" "$sub_repo"
+
+# For each commit ID, we will create a files in containing
+# information that we'll later use to rewrite the subproject.
+mkdir "$SPLIT_MAP_DIR" || exit 1
+
+index_filter=$(cat << \EOF
+map_info="$SPLIT_MAP_DIR/$GIT_COMMIT"
+if git rev-parse -q --verify $GIT_COMMIT:"$SPLIT_SUBMODULE_DIR"; then
+	# Adapted from git-filter-branch.
+	err=$(git read-tree -i -m $GIT_COMMIT:"$SPLIT_SUBMODULE_DIR" 2>&1) ||
+		die "$err"
+        echo -n "$SPLIT_SUBMODULE_DIR" > "$map_info-dir"
+else
+	# We will use an empty file to indicate that the directory
+        # doesn't exist in the tree.
+	# touch "$map_info-skipped"
+	die "Directory is missing"
+fi
+EOF
+)
+
+commit_filter=$(cat << \EOF
+map_info="$SPLIT_MAP_DIR/$GIT_COMMIT"
+new_commit="$(git commit-tree "$@")" || exit 1
+echo $new_commit
+echo $new_commit > "$map_info-submodule-commit" ||
+	die "Can't record the commit ID of the new commit"
+EOF
+)
+
+# Run our filters.
+(cd "$sub_repo" &&
+ git filter-branch --index-filter "$index_filter" \
+     --commit-filter "$commit_filter" -- "$revs") || exit 1
+
+
+#--------------------------------------------------------------------------
+# Create the new superproject
+
+# Next, create our new parent repository.
+git_mirror "$src_repo" "$dst_repo"
+
+index_filter=$(cat << \EOF
+map_info="$SPLIT_MAP_DIR/$GIT_COMMIT"
+
+# Splice the repo into the tree.
+test -f "$map_info-submodule-commit" || die "Can't find map for $GIT_COMMIT"
+git rm -q --cached -r "$SPLIT_SUBMODULE_DIR" || exit 1
+echo "160000 $(cat "$map_info-submodule-commit")	$SPLIT_SUBMODULE_DIR" |
+	git update-index --index-info || exit 1
+
+# Construct a new .gitmodules file.
+cat > "$SPLIT_MAP_DIR/gitmodules" <<EOC
+[submodule "$SPLIT_SUBMODULE_DIR"]
+	path = $SPLIT_SUBMODULE_DIR
+	url = ../$SPLIT_SUB_REPO
+EOC
+
+# Write the new .gitmodules file into the tree.
+new_obj=$(git hash-object -t blob -w "$SPLIT_MAP_DIR/gitmodules") ||
+	die "Error adding new .gitmodules file to tree"
+git update-index --add --cacheinfo 100644 "$new_obj" .gitmodules || exit 1
+
+EOF
+)
+
+# Run our filter.
+(cd "$dst_repo" &&
+ git filter-branch --index-filter "$index_filter" -- "$revs") || exit 1
+
+exit 0
diff --git a/t/t7404-submodule-split.sh b/t/t7404-submodule-split.sh
new file mode 100755
index 0000000..b490c60
--- /dev/null
+++ b/t/t7404-submodule-split.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+#
+# Copyright 2009 Eric Kidd
+
+test_description='git submodule-split tests'
+. ./test-lib.sh
+
+rm -rf .git
+test_create_repo original
+
+test_expect_success \
+	'create original repository' \
+	'(cd original &&
+	  echo "In main project" > main-file &&
+	  mkdir sub1 &&
+	  echo "In sub1" > sub1/sub1-file &&
+	  git add . &&
+	  git commit -m "Original project and sub1")'
+
+test_expect_success \
+	'split out sub1' \
+	'git submodule-split original split1 sub1 &&
+	 git clone split1.git split1 &&
+	 test -f split1/main-file &&
+	 ! test -f split1/sub1/sub1-file &&
+	 git clone sub1.git sub1 &&
+	 test -f sub1/sub1-file'
+
+test_expect_success \
+	'compare split repositories with original' \
+	'rm -rf split1 &&
+	 git clone split1.git split1 &&
+	 (cd split1 && git submodule init sub1 && git submodule update) &&
+	 diff -uNr -x .git -x .gitmodules original split1'
+
+test_done
-- 
1.6.0.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html