[dmd-internals] Shift support for vector types (or: is vector type a first class type?)

Wed Apr 3 01:51:55 PDT 2013

On 3 April 2013 08:39, Walter Bright <walter at digitalmars.com> wrote:
>
> On 4/2/2013 10:58 PM, Kai Nacke wrote:
>
>
> While I understand your argumentation I still feel a bit uncomfortable with
> it. It creates a situation in which you can't tell me if a D program will
> compile by reading the source if I dont' tell you the target architecture. I
> think this is something really new.
> The current situation is that conditional compiling is used if interfaces
> etc. are not globally available.  This principle is now broken by the
> "invisible" rules which determines the availability of vector operations.
>
>
> Performant SIMD code is simply not portable between architectures. The
> programmer writing SIMD code ought to be guaranteed he's getting SIMD code,
> not workaround code that is 100x slower. I view a compiler error as being
> far more visible than silently generating unacceptably slow code.
>
>
>
>
>
> My approach would be to define the following: if D_SIMD is defined then only
> the optimal vector operations are available. This ensures your goal of
> generating code with optimal performance. If D_SIMD is not defined but a
> vendor specific SIMD implementation is available then the rules of this
> implementation hold (which may include generation of "workaround" code).
> This has the advantage of being explicit:
>
>     version(D_SIMD)
>     {
>         uint4 w = ...;
>         uint4 v = w << 1;
>     }
>     else version(XYZ_SIMD)
>     {
>         uint4 w = ..., x = ...;
>         // Not allowed by DMD . only fast on altivec
>         uint4 v = x << w;
>     }
>
> Or do I miss something?
>
>
> All the programmer really needs to do is use a version statement on the
> architecture for the SIMD code for that architecture, and then have a
> default with the workaround code. The point here will be that he *knowingly*
> selects the slow workaround code. This is critical for a systems programming
> language where programmers writing SIMD code are not always experts at
> dumping the compiler output to see what was generated.
>
> 2nd try: core.bitop.popcnt is a "workaround" for a missing popcnt
> instruction. LDC provides an intrinsic for popcnt but this is lowered to the
> "workaround" code if the popcnt instruction is not available. If we apply
> the same rules then this is verboten.
>
>
> The workaround code for popcnt isn't 100x slower.

Actually it is, and we should probably do something about that. (The
"workaround" code is the original, it actually dates from a time
before Intel added Popcount to their instruction set!)