Web lists-archives.com

RE: [Question] Signature calculation ignoring parts of binary files




On September 12, 2018 7:00 PM, Junio C Hamano wrote:
> "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes:
> 
> >> author is important to our process. My objective is to keep the
> >> original file 100% exact as supplied and then ignore any changes to
> >> the metadata that I don't care about (like Creator) if the remainder of
the
> file is the same.
> 
> That will *not* work.  If person A gave you a version of original, which
> hashes to X after you strip the cruft you do not care about, you would
> register that original with person A's fingerprint on under the name of X.
> What happens when person B gives you another version, which is not byte-
> for-byte identical to the one you got earlier from person A, but does hash
to
> the same X after you strip the cruft?  If you are going to store it in
Git, and if
> by SHA-1 you are calling what we perceive as "object name" in Git land,
you
> must store that one with person B's fingerprint on it also under the name
of
> X.  Now which version will you get from Git when you ask it to give you
the
> object that hashes to X?

The scenario is slightly different.
1. Person A gives me a new binary file-1 with fingerprint A1. This goes into
git unchanged.
2. Person B gives me binary file-2 with fingerprint B2. This does not go
into git yet.
3. We attempt a git diff between the committed file-1 and uncommitted file-2
using a textconv implementation that strips what we don't need to compare.
4. If file-1 and file-2 have no difference when textconv is used, file-2 is
not added and not committed. It is discarded with impunity, never to be seen
again, although we might whine a lot at the user for attempting to put
file-2 in - but that's not git's issue.
5. If file-1 and file-2 have differences when textconv is used, file-2 is
committed with fingerprint B2.
6. Even if an error is made by the user and they commit file-2 with B2
regardless of textconv, there will be a human who complains about it, but
git has two unambiguous fingerprints that happen to have no diffs after
textconv is applied.

My original hope was that textconv could be used to influence the
fingerprint, but I do not think that is the case, so I went with an
alternative. In the application, I am not allowed to strip any cruft off
file-1 when it is stored - it must be byte-for-byte the original file. This
application is marginally related to a DRM-like situation where we only care
about the original image provided by a user, but any copies that are
provided by another user with modified metadata will be disallowed from
repository.

Does that make more sense? 

Cheers,
Randall