SIMD support...

Sat Jan 7 18:25:54 PST 2012

On 8/01/12 1:32 AM, Manu wrote:
> On 8 January 2012 02:54, Peter Alexander <peter.alexander.au at gmail.com
> <mailto:peter.alexander.au at gmail.com>> wrote:
>
>     I agree with Manu that we should just have a single type like __m128
>     in MSVC. The other types and their conversions should be solvable in
>     a library with something like strong typedefs.
>
>
> Walter put in a reasonable effort to sway me to his side of the fence
> last night. I'm still not entirely sold that implementation inside the
> language is necessary to achieve these details, but I don't have enough
> background into to argue, and I'm not the one that has to maintain the
> code :)
>
> Here are some points we discussed... how do we do these (efficiently) in
> a library?

Just to be clear, it was only the types and conversions that I thought 
would be suitable for a library. Operations, along with their 
optimisations are best for compiler.

> ** Literal syntax.. and constant folding:
>
> Constants and literals also need to be aligned. If we use array syntax
> to express literals, this will be a problem.
>
>   int4 v = [ 1,2,3,4 ] + [ 5,6,7,8 ];
>
> Any constant expressions need to be simplified at compile time: int4 vec
> = [ 6,8,10,12 ];
> Perhaps this is possible with CTFE? Or will it be automatic if you
> express literals as if they were arrays?

You could use array syntax for vector literals, as long as they are 
stored directly into vector variables. e.g.

immutable int4 a = [1, 2, 3, 4];
immutable int4 b = [5, 6, 7, 8];
int4 v = a + b;

Constant folding can be done by compiler, although I don't think this is 
a priority.

> ** Expression interpretation/simplification:
>
>   float4 v = -b + a;
>
> Obviously, this should be simplified to 'a - b'.
>
>   float4 v = a*b + c;
>
> This should use a multiply-accumulate opcode on most architectures:
> FMADDPS v, a, b, c

Compiler should make these decisions, just like it does with int/float 
etc.  In some cases these kinds of simplifications can effect the result 
due to numeric issues.

You can use expression templates for this sort of thing as well, but 
they are a horrible mess, so I don't think I'd like to see them.

> ** Typed debug info
>
> In a debugger it's nice to inspect variables in their supposed type.
> Can probably use unions to do this... probably wouldn't be as nice though.

Good point. I'm not an expert on this, but I suspect that a union would 
be good enough?

> ** God knows what other optimisations
>
> float4 v = [ 0,0,0,0 ]; // XOR v
> etc...

Again, I think you could use expression templates for this, but it's so 
much simpler to leave this optimisation to the compiler.

Even if the compiler doesn't do it, it's not difficult to do it manually 
when you really need it:

float4 v = void;
asm { pxor v, v; }

Honestly, I'm not too bothered with these types of optimisations. As 
long as the compiler does the register allocation and instruction 
scheduling for me, I would be 99% happy because those things are the 
most tedious when trying to write structured code. I can easily enough 
change (-b + a) to (b - a) if that's faster, or insert specific 
instructions for generating vector constants, or do constant folding 
manually.

Of course, it would be nice if the compiler did them, but that's just 
icing on the cake. The meat of the problem is register allocation.

> I don't know what amount of this is achievable with libraries, but
> Walter seems to think this will all work much better in the language...
> I'm inclined to trust his judgement.

I agree.