Web lists-archives.com

Re: [PATCH] log,diff-tree: add --combined-with-paths options for merges with renames




On Fri, Jan 25, 2019 at 11:29 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Elijah Newren <newren@xxxxxxxxx> writes:
>
> > The raw diff format for merges with -c or --cc will only list one
> > filename, even if rename detection is active and a rename was detected
> > for the given path.  Examples:
> >
> >   ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM   describe.c
> >   ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM   bar.sh
> >   ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR   phooey.c
> >
> > This doesn't let us know what the original name of bar.sh was in the
> > first parent, and doesn't let us know what either of the original
> > names of phooey.c were in either of the parents.  In contrast, for
> > non-merge commits, raw format does provide original filenames (and a
> > rename score to boot).  In order to also provide original filenames
> > for merge commits, add a --combined-with-paths option (which is only
> > useful in conjunction with -c, --raw, and -M and thus implies all
> > those options) so that we can print tab-separated filenames when
> > renames are involved.  This transforms the above output to:
> >
> >   ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM   describe.c
> >   ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM   foo.sh  bar.sh
> >   ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR   fooey.c fuey.c  phooey.c
>
> I admit that I designed the original without too much thought.
> Perhaps we should have avoided discarding info, but it is way too
> late to fix with a default behaviour change.
>
> I am not sure if it is easy for consumers to guess which name on the
> output line corresponds to which input tree from the status letter,
> though.  Would it make it easier for consumers if this showed names
> in all input trees if any of them is different from the name in the
> resulting tree, I wonder?  Even in that case, the consumer must know
> some rule like "if R or C appears in the status column, then we have
> N preimage names plus the name in the result for N-way merge", so it
> may not be too bad to force them to know "for each of R or C in the
> status column, the name in the preimage tree is emitted, and the
> last name is the name in the result".  I dunno.

I wasn't able to guess the other fields in the raw format ("which of
the three blob shas is the final one and which ones were for parents")
without going and reading the diff-format.txt documentation.  Unless
we always list all names on all lines (even e.g. for 'MM' changes
which will list the same filename three times), we have a more
complicated case where people will have to refer to that
documentation.  I hope the extra paragraphs and examples I added there
are sufficient to spell it out.

Also, my first version of the patch actually showed all names, on all
lines, but I found the heavy repetition really annoying, and not in
keeping with how non-merge commits are handled (where original
filenames are only shown when they differ).  Granted, my change isn't
the only one.  We could just have all names shown if they are not all
identical, as you suggest and I also considered, but I liked this way
slightly better.  If others feel strongly, I can change it, that was
just my gut feel and preference.

> > +For `-c` and `--cc`, only the destination or final path is shown even
> > +if the file was renamed on any side of history.  With
> > +`--combined-with-paths`, the number of paths printed will be one more
> > +than the number of 'R' characters in the concatenated status.  For
> > +each 'R' in the concatenated status characters, the original pathname
> > +on that side of history will be shown, and the final path shown on the
> > +line will be the path used in the merge.
>
> Is it safe for readers to pay attention to only 'R'?  Will it stay
> forever that way?  My immediate worry is 'C', but there might be
> other cases that original and result have different names.

Oops, yeah, it should also handle 'C' and be worded so that if any
future change type comes along involving different names then it'd be
included.

> > +--combined-with-paths::
> > +     This flag is similar to -c, but modifies the raw output format for
> > +     merges to also show the original paths when renames are found.
> > +     Implies -c, -M, and --raw.
>
> So, --cc -p is not allowed to use this?  I was wondering if we want
> to have a separate "even though traditionally we did not show
> preimage names in combined output, this option tells Git to do so,
> regardless of output format used, as long as 'combine-diff' is in
> effect".

You could kind of ask the same question of -c -p, actually.  I looked
into that, but I was only interested in raw format output and --cc is
only about coalescing uninteresting hunks in patches.  Whenever git
shows a combined diff in patch format, it always lists two files in
the header, e.g.:
  a/foo.c
  b/foo.c
perhaps because people have a built-in expectation that a diff has to
involve exactly two files.  I wasn't sure how hard-baked in that
assumption is, but as I was only interested in raw format I didn't
mess with it.  We'd have to switch to showing three or more if we want
this to be relevant to such a mode.  Would that make sense for users
to show in a patch?  I guess the "combined" patch is already kind of
special, so it could make sense, but it kind of feels like a follow-on
feature for someone else to implement...unless leaving it out now
somehow boxes us in.