Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"




Hi Andy,

On 2019-05-23 17:52, Andy Simpkins wrote:
> Sam.
> Whilst i agree that "assets" in some packages may not have sources
> with them and the application may still be in main if it pulls in
> those assets from contrib or non free. 
> I am trying to suggest the same thing here. If the data set is unknown
> this is the *same* as a dependancy on a random binary blob (music /
> fonts / game levels / textures etc) and we wouldn't put that in main. 

The "ToxicCandy Model" is used to cover a special case. Both
"ToxicCandy"
and "Non-free" model cannot enter our main section, as stated by
DL-Policy #1 from the beginning.

> It is my belief that we consider training data sets as 'source' in
> much the same way....

We can interpret training data as sort of "source" indeed. But some
times we even have trouble with free "source". Wikipedia dump is
a frequently used free corpus in the computational linguistics
field. Do we really want to upload the wikipedia dump to the
archive when some Free Model to be packaged is trained from it?

Wikipedia dump is so giant that challenges our .deb format
(see recent threads).

See (Difficulties -- Dataset Size):
https://salsa.debian.org/lumin/deeplearning-policy#difficulties-questions-not-easy-to-answer