Web lists-archives.com

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"




Hi Paul,

On 2019-05-24 11:50, Paul Wise wrote:
> On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote:
> 
>> Non-free nvidia driver is inevitable.
>> AMD GPUs and OpenCL are not sane choices.
> 
> So no model which cannot be CPU-trained is suitable for Debian main.

I've already pointed out that 1 year ago. Modern DL frameworks
supports different computation devices, typically CPU and GPU (CUDA).
And CUDA training is typically tens or hundreds times faster than
CPU training. I've already raised the question that whether a model
is really free if training it on purely free data and free software
takes 1 year, but merely 1 hour with non-free software.
In that historical thread people thought this is
not a solvable problem so I didn't wrote much about it.

My word "can't" means "cannot finish within a reasonable time frame".
If I can live for 1e9 years, I'd definitely say non-free
software is not necessary even if training on a weak i3 CPU
takes a short period, say, 100 years.

I updated Difficulty#2 and mentioned this.
Packages within my radar are not likely suffering
from the hard problems.

> https://github.com/hughperkins/coriander

Added to watch list.
 
>> Some good Xeon CPUs can train models as well,
>> and a well optimized linear algebra library
>> helps a lot (e.g. MKL, OpenBLAS). But generally
>> CPU training takes at least 10x longer time to
>> finish. (except some toy networks)
> 
> So only toy networks can enter Debian main?

Not exactly. Some useful stuff can be trained by
CPUs within a reasonable timeframe. We can analyze them
if we got some concrete cases.

NLTK-data contains some good examples of useful
"Free Models". I haven't finished inspecting it's
contents but some of it's components can meet
the high standard of "Free Model".

As I said at the beginning. The initial draft policy is conservative
and overkilling. We can revise it to let more models pass
in the future if people request so.