Re: SIMDebian: Debian Partial Fork with Radical ISA Baseline
- Date: Sat, 6 Apr 2019 22:55:35 +0200
- From: Guillem Jover <guillem@xxxxxxxxxx>
- Subject: Re: SIMDebian: Debian Partial Fork with Radical ISA Baseline
On Fri, 2019-02-08 at 16:25:41 +0000, Mo Zhou wrote:
> For most programs the "-march=native" option is not expected to bring any
> significant performance improvement. However for some scientific applications
> this proposition doesn't hold. When I was creating the tensorflow debian
> package, I observed a significant performance gap between generic code and
> kabylake (Intel 7XXX Series) code.
> Having seen such interesting results, I immediately created a Debian partial
> fork named SIMDebian (SIMD + Debian). It makes great sense to some
> applications due to the significant performance gain brought by SIMD code.
> Currently this partial fork is still in the very early stage, and it needs
> * More experience about software that benefit a lot from SIMD code
> (e.g. What package would potentially benefit from SIMD code?)
> * Suggestions and comments
> (e.g. Is such a partial fork really useful and valuable?)
> * More people interested in this
> SIMDebian is only a PARTIAL fork, which means that it only takes care of
> packages that would obviously benefit from SIMD code, because no performance
> gain is expected in terms of the majority of packages in the Debian archive.
There's been talk in the past about this, AFAIR the most recent one
previous to this was about the various MIPS ISAs (?). We covered this
in the Debian Bootstrap sprint in 2014 (see §2):
There's not been much progress there, as it seemed like interest had
If what you are interested in though is just a small subset of the
archive, another option that would benefit everyone and is perhaps
less cumbersome than having to jugle around with multiple archives
and package rebuilds/variants, is to make use of libc's hwcaps [H]
support, which means the dynamic linker will automatically load the
best optimized shared object for the current hardware. This of course
can complicate a bit the packaging, and bloat it, but if the performance
improvement is substantial, it might be a very good trade-off.
[H] man ld.so "NOTES" / "Hardware capabilities"
Another option which requires upstream code changes (and ideally them
being complicit) is to add run-time selection for the more suitable
optimized functions, for example via the __target__ and __ifunc__ [I]
function __attribute__ (and __builtin_cpu_supports or __builtin_cpu_is),
or the __target_clone__ function __attribute__. Perhaps also of
interest is the __simd__ function __attribute__.
[I] info gcc "Function Attributes";