Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
- Date: Tue, 21 May 2019 11:07:09 +0200
- From: Andreas Tille <andreas@xxxxxxxx>
- Subject: Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"
thanks again for all your effort for Deep Learning in Debian.
Please note, that I'm not competent in this field.
On Tue, May 21, 2019 at 12:11:14AM -0700, Mo Zhou wrote:
> (issue tracker is enabled)
Not sure whether this is sensible to be added to the issue
> See my draft for details.
Quoting from your section "Questions Not Easy to Answer"
1. Must the dataset for training a Free Model present in our archive?
Wikipedia dump is a frequently used free dataset in the computational
linguistics field, is uploading wikipedia dump to our Archive sane?
I have no idea about the size of this kind of dump. Recently I've read
that data sets for other programs tend into the direction of 1GB. In
Debian Med I'm maintaining metaphlan2-data with 204MB which would be
even larger if there would not be some method for "data reduction" would
be used that is considered a bug (#839925) by other DDs.
2. Should we re-train the Free Models on buildd? This is crazy. Let's
don't do that right now.
If you ask me bothering buildd with this task is insane. However I'm
positively convinced that we should ship the training data and be able
to train the models from these.