Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
- Date: Thu, 23 May 2019 22:53:15 -0700
- From: Mo Zhou <lumin@xxxxxxxxxx>
- Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
On 2019-05-22 12:43, Sam Hartman wrote:
> So, I think it's problematic to apply old assumptions to new areas. The
> reproducible builds world has gotten a lot further with bit-for-bit
> identical builds than I ever imagined they would.
I overhauled the reproducibility section. And lowered the
standard from "Bit-by-Bit" to "Numerically", which is the most practical
choice for now. Anyway we can raise the bar in the future if things got
better in terms of reproducibility.
> However, what's actually needed in the deep learning context is weaker
> than bit-for-bit identical. What we need is a way to validate that two
> models are identical for some equality predicate that meets our security
> and safety (and freedom) concerns. Parallel computation in the
> training, the sort of floating point issues you point to, and a lot of
> other things may make bit-for-bit identical models hard to come by.
Indeed: I name this as "Numerically Reproducible":
> Obviously we need to validate the correctness of whatever comparison
> function we use. The checksums match is relatively easy to validate.
> Something that for example understood floating point numbers would have
> a greater potential for bugs than an implementation of say sha256.
> So, yeah, bit-for-bit identical is great if we can get it. But
> validating these models is important enough that if we need to use a
> different equality predicate it's still worth doing.
For now, we just need to compare the digits and the curves: train twice
without any modification, and see if the curves and digits are the same.
Further measures, I think, depends on how this field evolves.