Re: RFC: Another proposed hash function transition plan
- Date: Mon, 6 Mar 2017 11:59:59 -0800
- From: Brandon Williams <bmwill@xxxxxxxxxx>
- Subject: Re: RFC: Another proposed hash function transition plan
On 03/06, Linus Torvalds wrote:
> On Mon, Mar 6, 2017 at 10:39 AM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> > I think "nohash" can be explained in 2 points:
> I do think that that was my least favorite part of the suggestion. Not
> just "nohash", but all the special "hash" lines too.
> I would honestly hope that the design should not be about "other
> hashes". If you plan your expectations around the new hash being
> broken, something is wrong to begin with.
> I do wonder if things wouldn't be simpler if the new format just
> included the SHA1 object name in the new object. Put it in the
> "header" line of the object, so that every time you look up an object,
> you just _see_ the SHA1 of that object. You can even think of it as an
> additional protection.
> Btw, the multi-collision attack referenced earlier does _not_ work for
> an iterated hash that has a bigger internal state than the final hash.
> Which is actually a real argument against sha-256: the internal state
> of sha-256 is 256 bits, so if an attack can find collisions due to
> some weakness, you really can then generate exponential collisions by
> chaining a linear collision search together.
> But for sha3-256 or blake2, the internal hash state is larger than the
> final hash, so now you need to generate collisions not in the 256
> bits, but in the much larger search space of the internal hash space
> if you want to generate those exponential collisions.
> So *if* the new object format uses a git header line like
> "blob <size> <sha1>\0"
> then it would inherently contain that mapping from 256-bit hash to the
> SHA1, but it would actually also protect against attacks on the new
> hash. In fact, in particular for objects with internal format that
> differs between the two hashing models (ie trees and commits which to
> some degree are higher-value targets), it would make attacks really
> quite complicated, I suspect.
> And you wouldn't need those "hash" or "nohash" things at all. The old
> SHA1 would simply always be there, and cheap to look up (ie you
> wouldn't have to unpack the whole object).
I'll agree that the "hash" "nohash" bit isn't my favorite and is really
only there to address the signing of tags/commits in this new non-sha1
world. I'm inclined to take a closer look at Jeff's suggestion which
simply has a signature for the hash that the signer cares about.
I don't know if keeping around the SHA1 for every object buys you all
that much. It would add an additional layer of protection but you would
also need to compute the SHA1 for each object indefinitely (assuming you
include the SHA1 in new objects and not just converted objects). The
hope would be that at some point you could not worry about SHA1 at all.
That may be difficult for projects with long history with commit msgs
which reference SHA1's of other commits (if you wanted to look up the
referenced commit, for example), but projects started in the new
non-sha1 world shouldn't have to ever compute a sha1.
Also, during this transition phase you would still need to maintain the
sha1<->sha256 translation table to make looking up objects by their sha1
name in a sha256 repo fast. Otherwise I think it would take a
non-trivial amount of time to search a sha256 repo for a sha1 name. So
if you do include the sha1 in the new object format then you would end
up with some duplicate information, which isn't the end of the world.