Re: [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines
- Date: Sun, 14 Apr 2019 10:41:26 +0100
- From: Michael Platings <michael@xxxxxxxxx>
- Subject: Re: [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines
> - I wonder if the hash used here can replace what is used in
> diffcore-delta.c as an improvement (or obviously vice versa), as
> using two (or more) ad-hoc fingerprinting function without having
> a clear reason why we need two instead of a unified one feels
> like a bad idea.
If I understand correctly, the algorithm in diffcore-delta.c is
intended to match files that contain identical lines (or 64-byte
chunks). The fingerprinting that Barret & I are talking about is
intended to match lines that contain identical byte pairs.
With significant refactoring, you could make the diffcore-delta
algorithm apply in both cases but I think the end result would be
longer and more complicated than keeping the two separate.
Unlike hashing a line, hashing a byte pair is trivial. Unlike hashing
lines, all except the first and last bytes are included in two
"hashes" - "hello" is hashed to "he", "el", "ll", "lo".
So based on my limited understanding of diffcore-delta.c I think the
two are algorithms are sufficiently different in intent and in
implementation that it's appropriate to keep them separate.
Regarding the "old heuristic" I think there may still be a use case
for that but I'll expand on that later.