Does dmd have SSE intrinsics?

Mon Sep 21 14:40:33 PDT 2009

Don:
> (1) They don't take advantage of fixed-length arrays. In particular, 
> operations on float[4] should be a single SSE instruction (no function 
> call, no loop, nothing). This will make a huge difference to game and 
> graphics programmers, I believe.
[...]
>It's issue (1) which is the killer.

In my answer I have forgotten to say another small thing.

The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.

In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes.

float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];

So you may need a syntax like the following, that's not handy:

align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];

A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-)
A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening.

Bye,
bearophile