Web lists-archives.com

RFC: rebase inconsistency in 2.18 -- course of action?

Hi everyone,

We've got a small rebase problem; I'm curious for thoughts about the
best course of action.  I'll provide several options below.

In short, while interactive-based and merge-based rebases handle
directory rename detection fine, am-based rebases do not.  This
inconsistency may frustrate or surprise users.

=== Demonstration Example ===

Let's say we have a repo with three commits -- an original commit
('O'), and two branches based off of it, A and B with an extra commit

  O: has files l, x/a, x/b, x/c
  A: renames l -> lt, x/ -> y/
  B: modifies l, adds x/d

If I checkout B and run

  git rebase --interactive A

and don't bother editing the list of commits at all but just save,
then I end up with lt, y/{a, b, c, d} as expected.  However, if I run

  git rebase A

then I end up with lt, y/{a, b, c} and x/d; d is in the wrong place.

=== Problem explanation ===

am-based rebases suffer from a reduced ability to detect directory
renames upstream, which is fundamental to the fact that it throws away
information about the history: in particular, it dispenses with the
original commits involved by turning them into patches, and without
the knowledge of the original commits we cannot determine a proper
merge base.

The am-based rebase will proceed after generating patches by
commit-wise first trying git-apply, and if that fails it will attempt
a 3-way merge.  Since am has no knowledge of original commits, it
instead reconstructs a provisional merge base using
build_fake_ancestor() which will only contain the original versions of
the files modified in the patch.  Without the full list of files in
the real merge base, renames of any missing files cannot be detected.
Directory rename detection works by looking at individual file renames
and deducing when a full directory has been renamed.

Trying to modify build_fake_ancestor() to instead create a merge base
that includes common file information by looking for a commit that
contained all the same blobs as the pre-image of the patch would
require a very expensive search through history, and may choose a
wrong commit from among the several it finds.  (Also, regular am
outside of rebase may not find any commit that matches, forcing it to
somehow choose a near-fit).  Further, this would only work when the
fallback to a 3-way merge is triggered (i.e. when git-apply fails),
which may not happen for some patches. (For example, if the l->lt
change were not in the example, then git-apply would succeed and
incorrectly place d in x/, so we'd never even hit the 3-way fallback.)

In short, the am machinery simply doesn't have the necessary
information to properly detect directory renames, except when other
files involved in the renames upstream were also modified on the side
with the new file additions.

=== Possible solutions ===

1. Just accept it as a known shortcoming.

2. Revert directory rename detection from master...again.

   (Please, please, let's not pick this one.)

3. Claim it's not a bug at all.  In particular, the git-rebase(1) manpage
   states for the `--merge` option:

       Use merging strategies to rebase.  When the recursive (default)
       merge strategy is used, this allows rebase to be aware of
       renames on the upstream side.

   While "renames" is precisely why rebase --merge was added in commit
   58634dbff822 back on 2006-06-21, am-based rebases gained the
   ability to handle normal (non-directory) renames back in commit
   18cdf802ca6e from 2008-11-10.  So folks might be used to and expect
   their simple rebases to handle renames the same as --merge or
   --interactive rebases do.

4. Edit the 2.18 release notes.  Point out that directory renames can
   be detected by cherry-pick, merge, and rebases with either the -i
   or -m options (or any rebase option that implies either -i or -m);
   but not always by other rebases.  Folks may want to set pull.rebase
   to 'interactive' and just get used to immediately saving the
   popped-up list of commits and exiting to avoid this problem.

   Also, there's no good way to avoid this problem when using git-am
   directly.  At best, we can tell users to carefully select which
   older commit base to apply the patches on top of and then
   immediately run rebase --merge afterward.

5. Add an --am option to git-rebase, and make it not be the default
   (though make it be triggered by am-specific options like
   --whitespace or -C).  For the default, use either:

     5a. do_merge=t

     5b. interactive_rebase=implied

   As a slight aside, since there are only a few small changes
   necessary to make git-rebase--interactive handle ALL functionality
   from git-rebase--merge, we could take 5b one step further and
   delete git-rebase--merge and have one less rebase type.

   Either 5a or 5b may have a minor performance hit here, though
   personally I don't see this as a major factor.  But I'm happy to
   discuss at a high level if folks are curious (exact figures would
   depend heavily on the repo, size of commits, etc.)

6. Since there's not much time left before 2.18, do (1) and (4) for
   now, and (5) for 2.19.  Continue doing (1) or (3) for git-am itself.

7. Other solutions I haven't thought of?

=== Final Notes ===

I have patches implementing 5b, including the bonus removal of
git-rebase--merge.  However:

  * while my patches apply cleanly on master or next, there are lots
    of minor conflicts for the final two patches with ag/rebase-p in
    pu (in fact, even master conflicts with ag/rebase-p).  So I'm
    curious if I should wait off on sending these patches until that
    topic advances.

  * My first 9 patches are 7 independent fixes and cleanups, so I'll
    send those out shortly.

Thanks for reading this far,