Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"




Hi Andy,

Thanks for you comments.

On 2019-05-23 09:28, Andy Simpkins wrote:
> Your wording "The model /should/be reproducible with a fixed random seed." feels
> correct but wonder if guidance notes along the following lines should be added?
> 
>     *unless* we can reproduce the same results, from the same training data,
>     you cannot classify as group 1, "Free Model", because verification that
>     training has been carried out on the dataset explicitly licensed under a
>     free software license can not be achieved.  This should be treated as a
>     severe bug and the entire suite should be classified as group 2,
>     "ToxicCandy Model", until such time that verification is possible.

Ummm... This is actually a bit cruel to upstream ... And I think there
is still some misunderstanding. I've updated the document and made the
following points clear:

- "Numerically Reproducible" is the default reproduciblity definition
  in the context

 
https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility

- A Free Model should be Numerically Reproducible,
  or at least a locally-trained model can reach similar performance
  (e.g. accuracy) compared to the original one.

  Similar results are acceptable. The bar "Identical" is not always
reachable.

- The datasets used for training a "ToxicCandy" may be
  private/non-free and not everybody can access them. (This case is more
  likely a result of problematic upstream licensing, but it sometimes
happens).

  One got a free model from internet. That little candy tastes sweet.
  One wanted to make this candy at home with the provided recipe, but
  surprisingly found out that non-free ingredients are inevitable.
    -- ToxicCandy

Is the updated document clearer?