Web lists-archives.com

Bug#929606: ITP: dataset-fashion-mnist -- (DL-Policy) A MNIST-like fashion product database.




Package: wnpp
Severity: wishlist
Owner: Mo Zhou <lumin@xxxxxxxxxx>

* Package name    : dataset-fashion-mnist
* URL             : https://github.com/zalandoresearch/fashion-mnist
* License         : MIT
  Description     : A MNIST-like fashion product database.

This is a part of DL-Policy[1]'s experiments.

The first, typical dataset used by everyone who started to learn machine
learning and deep learning is very possibly MNIST[2].  MNIST had been
used for over 30 years by researchers and engineers to valiate their
algorithms and learning frameworks, etc. However, The original MNIST
dataset doesn't have (I didn't find it) an explict license. And I have
to use an alternative -- the modern replacement of that MNIST dataset,
i.e. fashion-mnist. It has an explicit MIT license.

Packaging this dataset is to some extent meaningful:

  1. Dataset size: ~30MiB. Very friendly to any modern storage devices.
  2. A "UnitTest" dataset for virtually any deep learning framework.
     Developers can use this dataset to validate any machine learning
     or deep learning frameworks.
  3. Very easy to train and validate. This is a tiny "toy" dataset.
     A weak CPU can train models on this dataset in a reasonable timeframe.
  4. As an DL-Policy-compliant dataset package example.
  5. The dataset it self is frozen. Subsequent maintainance burden after
     the initial upload is nearly zero ...

See also:
https://github.com/zalandoresearch/fashion-mnist#why-we-made-fashion-mnist

[1] https://salsa.debian.org/lumin/deeplearning-policy
[2] http://yann.lecun.com/exdb/mnist/