Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

Whilst i agree that "assets" in some packages may not have sources with them and the application may still be in main if it pulls in those assets from contrib or non free.
I am trying to suggest the same thing here. If the data set is unknown this is the *same* as a dependancy on a random binary blob (music / fonts / game levels / textures etc) and we wouldn't put that in main.

It is my belief that we consider training data sets as 'source' in much the same way....


On 23 May 2019 16:33:24 BST, Sam Hartman <hartmans@xxxxxxxxxx> wrote:
"Andy" == Andy Simpkins <rattusrattus@xxxxxxxxxx> writes:

Andy>     *unless* we can reproduce the same results, from the same
Andy> training data,     you cannot classify as group 1, "Free
Andy> Model", because verification that     training has been
Andy> carried out on the dataset explicitly licensed under a    
Andy> free software license can not be achieved.  This should be
Andy> treated as a     severe bug and the entire suite should be
Andy> classified as group 2,     "ToxicCandy Model", until such time
Andy> that verification is possible.

I don't think that's entirely true.
If we've done the training we can have confidence that it's free.
Reproducibility is still an issue, but is no more or less an issue than
with any other software.

Consider how we treat assets for games or web applications. And yes
there are some confusing areas there and areas where we'd like to
improve. But let's be consistent in what we demand from various
communities to be part of Debian. Let's not penalize people for being
new and innovative.


Sent from my Android device with K-9 Mail. Please excuse my brevity.