Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"


On 2019-05-22 12:43, Sam Hartman wrote:
> So, I think it's problematic to apply old assumptions to new areas.  The
> reproducible builds world has gotten a lot further with bit-for-bit
> identical builds than I ever imagined they would.

I overhauled the reproducibility section. And lowered the
standard from "Bit-by-Bit" to "Numerically", which is the most practical
choice for now. Anyway we can raise the bar in the future if things got
better in terms of reproducibility.

> However, what's actually needed in the deep learning context is weaker
> than bit-for-bit identical.  What we need is a way to validate that two
> models are identical for some equality predicate that meets our security
> and safety (and freedom) concerns.  Parallel computation in the
> training, the sort of floating point issues you point to, and a lot of
> other things may make bit-for-bit identical models hard to come by.

Indeed: I name this as "Numerically Reproducible":

> Obviously we need to validate the correctness of whatever comparison
> function we use.  The checksums match is relatively easy to validate.
> Something that for example understood floating point numbers would have
> a greater potential for bugs than an implementation of say sha256.
> So, yeah, bit-for-bit identical is great if we can get it.  But
> validating these models is important enough that if we need to use a
> different equality predicate it's still worth doing.

For now, we just need to compare the digits and the curves: train twice
without any modification, and see if the curves and digits are the same.
Further measures, I think, depends on how this field evolves.