Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
- Date: Thu, 23 May 2019 23:37:41 -0700
- From: Mo Zhou <lumin@xxxxxxxxxx>
- Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
Thanks for you comments.
On 2019-05-23 09:28, Andy Simpkins wrote:
> Your wording "The model /should/be reproducible with a fixed random seed." feels
> correct but wonder if guidance notes along the following lines should be added?
> *unless* we can reproduce the same results, from the same training data,
> you cannot classify as group 1, "Free Model", because verification that
> training has been carried out on the dataset explicitly licensed under a
> free software license can not be achieved. This should be treated as a
> severe bug and the entire suite should be classified as group 2,
> "ToxicCandy Model", until such time that verification is possible.
Ummm... This is actually a bit cruel to upstream ... And I think there
is still some misunderstanding. I've updated the document and made the
following points clear:
- "Numerically Reproducible" is the default reproduciblity definition
in the context
- A Free Model should be Numerically Reproducible,
or at least a locally-trained model can reach similar performance
(e.g. accuracy) compared to the original one.
Similar results are acceptable. The bar "Identical" is not always
- The datasets used for training a "ToxicCandy" may be
private/non-free and not everybody can access them. (This case is more
likely a result of problematic upstream licensing, but it sometimes
One got a free model from internet. That little candy tastes sweet.
One wanted to make this candy at home with the provided recipe, but
surprisingly found out that non-free ingredients are inevitable.
Is the updated document clearer?