Web lists-archives.com

Re: [Question] Signature calculation ignoring parts of binary files

Junio C Hamano <gitster@xxxxxxxxx> writes:

> "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes:
>> The scenario is slightly different.
>> 1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
>> git unchanged.
>> 2. Person B gives me binary file-2 with fingerprint B2. This does not go
>> into git yet.
>> 3. We attempt a git diff between the committed file-1 and uncommitted file-2
>> using a textconv implementation that strips what we don't need to compare.
>> 4. If file-1 and file-2 have no difference when textconv is used, file-2 is
>> not added and not committed. It is discarded with impunity, never to be seen
>> again, although we might whine a lot at the user for attempting to put
>> file-2 in - but that's not git's issue.
> You are forgetting that Git is a distributed version control system,
> aren't you?  Person A and B can introduce their "moral equivalent
> but bytewise different" copies to their repository under the same
> object name, and you can pull from them--what happens?
> It is fundamental that one object name given to Git identifies one
> specific byte sequence contained in an object uniquely.  Once you
> broke that, you no longer have Git.

Having said all that, if you want to keep the original with frills
but somehow give these bytewise different things that reduce to the
same essence (e.g. when passed thru a filter like textconv), I
suspect a better approach might be to store both the "original" and
the result of passing the "original" through the filter in the
object database.  In the above example, you'll get two "original"
objects from person A and person B, plus one "canonical" object that
are bytewise different from either of these two originals, but what
they reduce to when you use the filter on them.  Then you record the
fact that to derive the "essence" object, you can reduce either
person A's or person B's "original" through the filter, perhaps by
using "git notes" attached to the "essence" object, recording the
object names of these originals (the reason why using notes in this
direction is because you can mechanically determine which "essence"
object any given "original" object reduces to---it is just the
matter of passing it through the filter.  But there can be more than
one "original" that reduces to the same "essence").