Web lists-archives.com

Re: [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines




Barret Rhoden <brho@xxxxxxxxxx> writes:

> This replaces the heuristic used to identify lines from ignored commits
> with one that finds likely candidate lines in the parent's version of
> the file.
>
> The old heuristic simply assigned lines in the target to the same line
> number (plus offset) in the parent.  The new function uses a
> fingerprinting algorithm to detect similarity between lines.
>
> The fingerprint code and the idea to use them for blame came from
> Michael Platings <michael@xxxxxxxxx>.
>
> For each line changed in the target, i.e. in a blame_entry touched by a
> target's diff, guess_line_blames() finds the best line in the parent,
> above a magic threshold.  Ties are broken by proximity of the parent
> line number to the target's line.
>
> We actually make two passes.  The first pass checks in the diff chunk
> associated with the blame entry - specifically from blame_chunk().
> Often times, those diff chunks are small; any 'context' in a normal diff
> chunk is broken up into multiple calls to blame_chunk().  We make a
> second pass over the entire parent, with a slightly higher threshold.

Two thoughts.

 - Unless the 'old heuristic' is still available as an option after
   this step, a series that first begins with the 'old heuristic'
   and then later replaces it with the 'new heuristic' feels
   somewhat wasteful of reviewer resources, as the 'old heuristic'
   does not contribute an iota to the end result.

   It is OK while the series is still in RFC/WIP stage, though.  But
   because I got an impression that this is close to completion, so...

 - I wonder if the hash used here can replace what is used in
   diffcore-delta.c as an improvement (or obviously vice versa), as
   using two (or more) ad-hoc fingerprinting function without having
   a clear reason why we need two instead of a unified one feels
   like a bad idea.