[dmd-internals] Shift support for vector types (or: is vector type a first class type?)

Tue Apr 2 23:39:05 PDT 2013

On 4/2/2013 10:58 PM, Kai Nacke wrote:
>
> While I understand your argumentation I still feel a bit uncomfortable with 
> it. It creates a situation in which you can't tell me if a D program will 
> compile by reading the source if I dont' tell you the target architecture. I 
> think this is something really new.
> The current situation is that conditional compiling is used if interfaces etc. 
> are not globally available.  This principle is now broken by the "invisible" 
> rules which determines the availability of vector operations.

Performant SIMD code is simply not portable between architectures. The 
programmer writing SIMD code ought to be guaranteed he's getting SIMD code, not 
workaround code that is 100x slower. I view a compiler error as being far more 
visible than silently generating unacceptably slow code.

>
>
> My approach would be to define the following: if D_SIMD is defined then only 
> the optimal vector operations are available. This ensures your goal of 
> generating code with optimal performance. If D_SIMD is not defined but a 
> vendor specific SIMD implementation is available then the rules of this 
> implementation hold (which may include generation of "workaround" code). This 
> has the advantage of being explicit:
>
>     version(D_SIMD)
>     {
>         uint4 w = ...;
>         uint4 v = w << 1;
>     }
>     else version(XYZ_SIMD)
>     {
>         uint4 w = ..., x = ...;
>         // Not allowed by DMD . only fast on altivec
>         uint4 v = x << w;
>     }
>
> Or do I miss something?

All the programmer really needs to do is use a version statement on the 
architecture for the SIMD code for that architecture, and then have a default 
with the workaround code. The point here will be that he *knowingly* selects the 
slow workaround code. This is critical for a systems programming language where 
programmers writing SIMD code are not always experts at dumping the compiler 
output to see what was generated.
> 2nd try: core.bitop.popcnt is a "workaround" for a missing popcnt instruction. 
> LDC provides an intrinsic for popcnt but this is lowered to the "workaround" 
> code if the popcnt instruction is not available. If we apply the same rules 
> then this is verboten.

The workaround code for popcnt isn't 100x slower.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/dmd-internals/attachments/20130402/7cf1e62c/attachment.html>