Re: SIMDebian: Debian Partial Fork with Radical ISA Baseline
- Date: Tue, 9 Apr 2019 06:48:59 +0000
- From: Mo Zhou <lumin@xxxxxxxxxx>
- Subject: Re: SIMDebian: Debian Partial Fork with Radical ISA Baseline
Thanks for your helpful pointers.
On Sat, Apr 06, 2019 at 10:55:35PM +0200, Guillem Jover wrote:
> If what you are interested in though is just a small subset of the
> archive, another option that would benefit everyone and is perhaps
> less cumbersome than having to jugle around with multiple archives
> and package rebuilds/variants, is to make use of libc's hwcaps [H]
> support, which means the dynamic linker will automatically load the
> best optimized shared object for the current hardware. This of course
> can complicate a bit the packaging, and bloat it, but if the performance
> improvement is substantial, it might be a very good trade-off.
> [H] man ld.so "NOTES" / "Hardware capabilities"
This sounds like a nice feature. However, unfortunately, the "avx2" and
"avx512" features I wanted didn't show up in the list... IIRC in my
original post I presented a C++ example with Eigen (a header-only
library). Reverse deps such as TensorFlow would benefit from this HWCAPS
feature if ld.so supported amd64's avx2 and avx512.
> Another option which requires upstream code changes (and ideally them
> being complicit) is to add run-time selection for the more suitable
> optimized functions, for example via the __target__ and __ifunc__ [I]
> function __attribute__ (and __builtin_cpu_supports or __builtin_cpu_is),
> or the __target_clone__ function __attribute__. Perhaps also of
> interest is the __simd__ function __attribute__.
> [I] info gcc "Function Attributes";
This compiler feature (which has been considered in the past) is a quite
good solution for small projects. However this is not easy to enforce for
projects like TensorFlow ...