Any usable SIMD implementation?

Marco Leise via Digitalmars-d digitalmars-d at puremagic.com
Mon Apr 11 08:29:27 PDT 2016


Am Wed, 6 Apr 2016 20:29:21 -0700
schrieb Walter Bright <newshound2 at digitalmars.com>:

> On 4/6/2016 7:25 PM, Manu via Digitalmars-d wrote:
> > TL;DR, defining architectures with an intel-centric naming convention
> > is a very bad idea.  
> 
> You're not making a good case for a standard language defined set of definitions 
> for all these (they'll always be obsolete, inadequate and probably wrong, as you 
> point out).

We can either define the language in terms of CPU models or
features and Manu gave two good reasons to go with features:
1) Typically we end up with "version(SSE4)" and similar in our
   code, not "version(Haswell)".
2) On ARM chips it turns out difficult to translate models to
   features to begin with.

It wasn't a good or bad case for the feature in general.
That said, in the long run Dlang should grow said language.
Aside from scientific servers there are also a few Linux
distributions that compile and install most packages from
sources and telling the compile to target the host CPU comes
naturally there. In practice there is likely some config file
that sets an environment variable like CFLAGS to
"-march=native" on such systems.
I understand that DMD doesn't concern itself with all that,
but the D language itself of which DMD is one implementation
should not artificially be limited compared to popular C/C++
compilers. I died a bit on the inside when I saw Phobos add
both popcnt and _popcnt of which the latter is the version that
uses the POPCNT instruction found in newer x86 CPUs.
In GCC or LLVM when we use such an intrinic, the compiler will
take a look at the compilation target and pick the optimal
code at compile-time. In one micro-benchmark [1], POPCNT was
roughly 50 times faster than bit-twiddling. If I wanted an SSE4
version in otherwise generic amd64 code, I would add
@attribute("target", "+sse4") before the function using popcnt.

So in my eyes a system like GCC offers, where you can specify
target features on the command line and also override them for
specific functions is a viable solution that simplifies user
code (just picking the popcnt, clz, bsr, ... intrinsic will
always be optimal) and Phobos code by making _popcnt et.al.
superfluous. In addition, the compiler could later error out on
mnemonics in our inline assembly that don't exist on the
target. This avoids unexpected "Illegal Instruction" crashes.

[1]
http://kent-vandervelden.blogspot.de/2009/10/counting-bits-population-count-and.html

-- 
Marco



More information about the Digitalmars-d mailing list