Bits from /me: A humble draft policy on "deep learning v.s. freedom"
- Date: Tue, 21 May 2019 00:11:14 -0700
- From: Mo Zhou <lumin@xxxxxxxxxx>
- Subject: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
A year ago I raised a topic on -devel, pointing out the
"deep learning v.s. software freedom" issue. We drew no
conclusion at that time, and linux distros who care about
software freedom may still have doubt on some fundamental
problems, e.g. "is this piece of deep learning software
People do lazy execution on this problem. Now that a
related package entered my packaging radar, and I think
I'd better write a draft and shed some light on a safety
area. Then here is the first humble attempt:
(issue tracker is enabled)
This draft is conservative and overkilling, and currently
only focus on software freedom. That's exactly where we
Specifically, I defined 3 types of pre-trained machine
learning models / deep learning models:
Free Model, ToxicCandy Model. Non-free Model
Developers who'd like to touch DL software should be
cautious to the "ToxicCandy" models. Details can be
found in my draft.
Apart from that, I pointed out in the draft that software
associated with any critical task should be considered
carefully as deep neural networks introduced a new kind
of vulnerability, that a network's response can be
disrupted or even controlled by some carefully designed
perturbations added to the network put.
Hence, I suggest that packaging an intelligent software
must be discussed on -devel if the piece of software is
associated with any kind of critical task, including but
not limited to
* authentication (e.g. login via face verification or
* program execution (e.g. intelligent voice assistants:
"Hey, Siri! sudo rm -rf / --no-preserve-root")
* physical object manipulation (e.g. mechanical
arms in non-educational occasion,
cars i.e. auto pilot), etc.
See my draft for details.
The package that entered my packaging radar is nltk_data.
The 2 most widely used python-based computational
linguistics toolkit, NLTK and Spacy, require these
data (datasets + models) to enable most of their