SIMD support...

Sun Jan 8 09:56:04 PST 2012

On 8/01/12 5:02 PM, Martin Nowak wrote:
> simdop will need more overloads, e.g. some
> instructions need immediate bytes.
> z = simdop(SHUFPS, x, y, 0);
>
> How about this:
> __v128 simdop(T...)(SIMD op, T args);

These don't make a lot of sense to return as value, e.g.

__v128 a, b;
a = simdop(movhlps, b); // ???

movhlps moves the top 64-bits of b into the bottom 64-bits of a. Can't 
be done as an expression like this.

Would make more sense to just write the instructions like they appear in 
asm:

simdop(movhlps, a, b);
simdop(addps, a, b);
etc.

The difference between this and inline asm would be:

1. Registers are automatically allocated.
2. Loads/stores are inserted when we spill to stack.
3. Instructions can be scheduled and optimised by the compiler.

We could then extend this with user-defined types:

struct float4
{
   union
   {
      __v128 v;
      float[4] for_debugging;
   }

   float4 opBinary(string op:"+")(float4 rhs) @forceinline
   {
     __v128 result = v;
     simdop(addps, result, rhs);
     return float4(result);
   }
}

We'd need a strong guarantee of inlining and removal of redundant 
load/stores though for this to work well. We'd also need a guarantee 
that float4's would get the same treatment as __v128 (as it is the only 
element).