Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

On 22/05/2019 03:53, Mo Zhou wrote:
Hi Tzafrir,

On 2019-05-21 19:58, Tzafrir Cohen wrote:
Is there a way to prove in some way (reproducible build or something
similar) that the results were obtained from that set using the specific
I wrote a dedicated section about reproducibility:

I suppose that the answer is negative, but it would have been nice to
have that.
In simple cases, fixing the seed for random number generator is enough.

If any upstream has ever claimed that their project aims to be of high
quality. Then unable to reproduce is very likely a fatal bug.

Reproducibility is also a headache among the machine learning and
deep learning communities. They are trying to improve the situation.
Everyone likes reproducible bits.

I agree completely.

Your wording "The model /should/be reproducible with a fixed random seed." feels correct but wonder if guidance notes along the following lines should be added?

    *unless* we can reproduce the same results, from the same training data,
    you cannot classify as group 1, "Free Model", because verification that
    training has been carried out on the dataset explicitly licensed under a     free software license can not be achieved.  This should be treated as a
    severe bug and the entire suite should be classified as group 2,
    "ToxicCandy Model", until such time that verification is possible.

Thank you for your work on this.