Any usable SIMD implementation?

9il via Digitalmars-d digitalmars-d at puremagic.com
Mon Apr 4 11:35:26 PDT 2016


On Monday, 4 April 2016 at 16:21:15 UTC, Marco Leise wrote:
> Am Mon, 04 Apr 2016 14:02:03 +0000
> schrieb 9il <ilyayaroshenko at gmail.com>:
> - On amd64, whether floating-point math is handled by the FPU
>   or SSE. When emulating floating-point, e.g. for
>   float-to-string and string-to-float code, it is useful to
>   know where to get the active rounding mode from, since they
>   may differ and at least GCC has a switch to choose between
>   both.
> - For compile time enabling of SSE4 code, a version define is
>   sufficient. Sometimes we want to select a code path at
>   runtime. For this to work, GDC and LDC use a conservative
>   feature set at compile time (e.g. amd64 with SSE2) and tag
>   each SSE4 function with an attribute to temporarily elevate
>   the instruction set. (e.g. @attribute("target", "+sse4"))
>   If you didn't tag the function like that the compiler would
>   error out, because the SSE4 instructions are not supported
>   by a minimal amd64 CPU.
>   To put this to good use, we need a reliable way - basically
>   a global variable - to check for SSE4 (or POPCNT, etc.). What
>   we have now does not work across all compilers.

@attribute("target", "+sse4")) would not work well for BLAS. BLAS 
needs compile time constants. This is very important because BLAS 
can be 95% portable, so I just need to write a code that would be 
optimized very well by compiler. --Ilya


More information about the Digitalmars-d mailing list