Web lists-archives.com

Re: [PATCH 1/2] filter-branch: Add --state-branch to hold pickled copy of ref map




Ian Campbell <ijc@xxxxxxxxxxxxxx> writes:

> Allowing for incremental updates of large trees.

"by doing what" is missing.  And ...

>
> I have been using this as part of the device tree extraction from the Linux
> kernel source since 2013, about time I sent the patch upstream!

... this does not help understanding what is going on.  It belongs
to the space after three dashes.

Perhaps

	Subject: filter-branch: stash away ref map in a branch

	With "--state-branch=<branchname>" option, the mapping from
	old object names and filtered ones in ./map/ directory is
	stashed away in the object database, and the one from the
	previous run is read to populate the ./map/ directory,
	allowing for incremental updates of large trees.

or something?

>
> Signed-off-by: Ian Campbell <ijc@xxxxxxxxxxxxxx>
> ---
>  git-filter-branch.sh | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 3a74602ef..d07db3fee 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -86,7 +86,7 @@ USAGE="[--setup <command>] [--env-filter <command>]
>  	[--parent-filter <command>] [--msg-filter <command>]
>  	[--commit-filter <command>] [--tag-name-filter <command>]
>  	[--subdirectory-filter <directory>] [--original <namespace>]
> -	[-d <directory>] [-f | --force]
> +	[-d <directory>] [-f | --force] [--state-branch <branch>]
>  	[--] [<rev-list options>...]"
>  
>  OPTIONS_SPEC=
> @@ -106,6 +106,7 @@ filter_msg=cat
>  filter_commit=
>  filter_tag_name=
>  filter_subdir=
> +state_branch=
>  orig_namespace=refs/original/
>  force=
>  prune_empty=
> @@ -181,6 +182,9 @@ do
>  	--original)
>  		orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
>  		;;
> +	--state-branch)
> +		state_branch="$OPTARG"
> +		;;
>  	*)
>  		usage
>  		;;
> @@ -252,6 +256,20 @@ export GIT_INDEX_FILE
>  # map old->new commit ids for rewriting parents
>  mkdir ../map || die "Could not create map/ directory"
>  
> +if [ -n "$state_branch" ] ; then
> +	state_commit=`git show-ref -s "$state_branch"`

I hate to nitpick styles, especially on this script that already has
existing violations, but for completeness:

Style: we prefer to write $(command substitution) instead.
Style: we prefer to write "if test", not "if [".
Style: we prefer to avoid ';' and write "if test condtion" and
       "then" on different lines.

It is a bit curious use of "show-ref".  It is not wrong per-se, but
"git rev-parse" may be more common.  I do not care too deeply either
way, though.

Don't we want to make sure the value given to --state-branch is a
full refname, not just a branch name?  What happens when you say 

	filter-branch --state-branch master

by mistake?  "show-ref -s" is likely to show your refs/heads/master,
and other master branches that appear as remote-tracking branches for
the remotes you interact with.

> +	if [ -n "$state_commit" ] ; then
> +		echo "Populating map from $state_branch ($state_commit)" 1>&2
> +		git show "$state_commit":filter.map |
> +		    perl -n -e 'm/(.*):(.*)/ or die;
> +				open F, ">../map/$1" or die;
> +				print F "$2" or die;
> +				close(F) or die'

The process calling this perl script, which carefully diagnoses
malformed input and dies, does not seem to do anything when it sees
errors.  Intended?

> +	else
> +		echo "Branch $state_branch does not exist. Will create" 1>&2
> +	fi
> +fi
> +
>  # we need "--" only if there are no path arguments in $@
>  nonrevs=$(git rev-parse --no-revs "$@") || exit
>  if test -z "$nonrevs"
> @@ -544,6 +562,25 @@ if [ "$filter_tag_name" ]; then
>  	done
>  fi
>  
> +if [ -n "$state_branch" ] ; then
> +	echo "Saving rewrite state to $state_branch" 1>&2
> +	STATE_BLOB=$(ls ../map |
> +	    perl -n -e 'chomp();
> +			open F, "<../map/$_" or die;
> +			chomp($f = <F>); print "$_:$f\n";' |

I see it somewhat gross to pipe the output of "/bin/ls" to a Perl
script, instead of iterating over "while (<../map/*>)" inside the
script itself.

> +	    git hash-object -w --stdin )
> +	STATE_TREE=$(/bin/echo -e "100644 blob $STATE_BLOB\tfilter.map" | git mktree)
> +	STATE_PARENT=$(git show-ref -s "$state_branch")

Don't you already have this in $state_commit?

One advantage of reading $state_branch again at this point is to
detect mistakes of running more than one filter-branch (which may
cause you to read $STATE_PARENT that is different from $state_commit
you read earlier), but I do not think that is being done here, so...

> +	unset GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
> +	unset GIT_COMMITTER_NAME GIT_COMMITTER_EMAIL GIT_COMMITTER_DATE

Hmph.  I can see that you are trying not to be affected by the
committers and authors of the commits on the branch being filtered
(which are set by finish_ident shell function), but I wonder if we
could (and more importantly "want to") do better to preserve the
real committer the user who runs the script may have in the
environment before running it.  I guess it does not matter that
much, as long as the user has properly user.{name,email} configured
elsewhere without relying on the environment variable.

> +	if [ -n "$STATE_PARENT" ] ; then
> +	    STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" -p "$STATE_PARENT")
> +	else
> +	    STATE_COMMIT=$(/bin/echo "Sync" | git commit-tree "$STATE_TREE" )
> +	fi
> +	git update-ref "$state_branch" "$STATE_COMMIT"
> +fi
> +
>  cd "$orig_dir"
>  rm -rf "$tempdir"

Despite all the above comments, I like what you are trying to
achieve here.  Thanks for sharing.