Re: Bits from /me: Difficulties in Deep Learning Framework Packaging
- Date: Tue, 16 Apr 2019 14:53:31 +0000
- From: Mo Zhou <lumin@xxxxxxxxxx>
- Subject: Re: Bits from /me: Difficulties in Deep Learning Framework Packaging
On Tue, Apr 16, 2019 at 02:29:54PM +0200, Andreas Tille wrote:
> Thanks a lot for the summary and all your previous work you've spent
> into this. As far as I understand your summary it would be even
> "burning" a student if we would throw theses packaging task on a
> student in a GSoC / outreachy project (I'm aware that we are usually
> not supporting packaging tasks in these projects but it could be an
> exception in case my suspicion would be wrong).
For your reference, at least the following skills are required if a
student wants to package a basic (CPU-only, ISA=generic) version of
* Proficient in python, very familiar with C++.
* Good at cmake. Able to read the bazel build.
* Basic background in machine learning, and know how to use Tensorflow
to train a neural network. So that he/she will be able to conduct
smoke tests and simple benchmarks.
The student may also need access to a strong build machine. My Xeon
E5-2687v4 x1  takes about 20 minutes to finish a build, while my
laptop (I5-7440HQ) takes ~90 minutes.
The student also needs to compare the existing build systems:
Tensorflow has three sets of build systems: (1) bazel, the officially
maintained build system. bazel packaging itself is already challenging
enough. Patching the bazel build to make it policy-compliant is not
a trivial work as well. This soulds like a dead end to the student;
(2) cmake build. It is not officially supported, and is badly synced
with the bazel build. The student could make the cmake build work
again as long as he/she has enough patience to dig into the bazel
build and update the cmake; (3) makefile. It is targeted on some
embedded services, and only builds a very core set of C++ interface.
This is not a sane choice for student. Apart from that, the
src:tensorflow in experimental uses a build system written by myself,
which is basically building the shared libraries manually with
auto-generated ninja build. This build system doesn't download
several gigabytes of dependencies, doesn't mixup outputs of 72 building
processes (thanks to ninja-build). It is able to produce the full
set of libraries including the shared object for python interface.
However, it's eventually experimental.
When all the prerequisites have been satisfied, it is still possible to
make some progress. Some people, for example, the accesibility team may
need the basic tensorflow package in the future. A system such as
DeepSpeech, that recognizes audio with pre-trained neural networks is
doing "inference", or "forward pass". That process requires much less
hardware performance compared to the "training" process. A basic version
of tensorflow is fine for "inference". Unfortunately my demand is
"training", which makes me quite unwilling to move forward a bit because
I won't use the basic version. Even if the basic version had been
prepared, problems about ISA baseline and non-free blobs are still
License is not a problem for the basic version because there is no
non-free stuff involved, and I've already reviewed the source package
file-by-file checking the license.
> Very good to know. Please keep on your great work
 It builds a defconfig linux kernel within a minute.
 My interest in a package will drastically decay to 0 if
I don't use it......