System programming in D (Was: The God Language)

Wed Jan 4 17:09:29 PST 2012

On 5/01/12 12:42 AM, bearophile wrote:
> Manu:
>
>> I'm not referring to vector OPERATIONS. I only refer to the creation of a
>> type to identify these registers...
>
> Please, try to step back a bit and look at this problem from a bit more distance. D has vector operations, and so far they have received only a tiny amount of love. Are you able to find some ways to solve some of your problems using a hypothetical much better implementation of D vector operations? Please, think about the possibilities of this syntax.
>
> Think about future CPU evolution with SIMD registers 128, then 256, then 512, then 1024 bits long. In theory a good compiler is able to use them with no changes in the D code that uses vector operations.
>
> Intrinsics are an additive change, adding them later is possible. But I think fixing the syntax of vector ops is more important. I have some bug reports in Bugzilla about vector ops that are sleeping there since two years or so, and they are not about implementation performance.
>
> I think the good Hara will be able to implement those syntax fixes in a matter of just one day or very few days if a consensus is reached about what actually is to be fixed in D vector ops syntax.
>
> Instead of discussing about *adding* something (register intrinsics) I suggest to discuss about what to fix about the *already present* vector op syntax. This is not a request to just you Manu, but to this whole newsgroup.
>
> Bye,
> bearophile

D has no alignment support, so there is no way to specify that you want 
a float[4] to be aligned on 16-bytes, which means there is no way for 
the compiler to generate code to exploit SSE well. It has to be 
conservative and assume unaligned.

Suppose alignment support is added:

alias align(16) float[4] vec4f;

vec4f a, b;
...
a[0] = a[3];
a[1] = a[2];
a[2] = b[0];
a[3] = b[1];

Is it reasonable to expect compilers to generate a single shuffle 
instruction from this? What about more complicated code like computing a 
dot product. What D code do I write to get the compiler to generate the 
expected machine code?

If we get alignment support and lots of work goes into optimizing vector 
ops for this then we can go a long with without intrinsics, but I don't 
think we'll ever be able to completely remove the need for intrinsics.