SIMD support...
Peter Alexander
peter.alexander.au at gmail.com
Sun Jan 8 09:56:04 PST 2012
On 8/01/12 5:02 PM, Martin Nowak wrote:
> simdop will need more overloads, e.g. some
> instructions need immediate bytes.
> z = simdop(SHUFPS, x, y, 0);
>
> How about this:
> __v128 simdop(T...)(SIMD op, T args);
These don't make a lot of sense to return as value, e.g.
__v128 a, b;
a = simdop(movhlps, b); // ???
movhlps moves the top 64-bits of b into the bottom 64-bits of a. Can't
be done as an expression like this.
Would make more sense to just write the instructions like they appear in
asm:
simdop(movhlps, a, b);
simdop(addps, a, b);
etc.
The difference between this and inline asm would be:
1. Registers are automatically allocated.
2. Loads/stores are inserted when we spill to stack.
3. Instructions can be scheduled and optimised by the compiler.
We could then extend this with user-defined types:
struct float4
{
union
{
__v128 v;
float[4] for_debugging;
}
float4 opBinary(string op:"+")(float4 rhs) @forceinline
{
__v128 result = v;
simdop(addps, result, rhs);
return float4(result);
}
}
We'd need a strong guarantee of inlining and removal of redundant
load/stores though for this to work well. We'd also need a guarantee
that float4's would get the same treatment as __v128 (as it is the only
element).
More information about the Digitalmars-d
mailing list