A little Py Vs C++

Fri Nov 2 14:32:29 PDT 2012

On Friday, 2 November 2012 at 14:22:34 UTC, Jens Mueller wrote:
> But the compiler knows about the alignment, doesn't it?
>
> align(16) float[4] a;
> vs
> float[4] a;
>
> In the former case the compiler can generate better code and it 
> should.
> The above syntax is not supported. But my point is all the 
> compiler
> cares about is the alignment which can be specified in the code 
> somehow.
> Sorry for being stubborn.
>
> Jens

Note: My knowledge of SIMD/SSE is fairly limited, and may be 
somewhat out of date. In other words, some of this may be flat 
out wrong.

First, just because you have something that can have SIMD 
operations performed on it, doesn't mean you necessarily want to. 
SSE instructions for example have to store things in the XMM 
registers, and accessing the actual values of individual elements 
in the vector is expensive. When using SSE, you want to avoid 
accessing individual elements as much as possible. Not following 
this tends to hurt performance quite badly. Yet when you just 
have a float[4], you may or may not be frequently or infrequently 
accessing individual elements. The compiler can't know whether 
you use it as a single SIMD vector more often, or use it to 
simply store 4 elements more often. You could be aligning it for 
any reason, so it's not too fair a way of determining it.

Secondly, you can't really know which SIMD instructions are 
supported by your target CPU. It's safe to say SSE2 is supported 
for pretty much all x86 CPUs at this point, but something like 
SSE4.2 instructions may not be. Just because the compiler knows 
that the CPU compiling it supports it doesn't mean that the CPU 
running the program will have those instructions.

Lastly, we'd still need SIMD intrinsics. It may be simple to tell 
that a float[4] + float[4] operation could use addps, but it 
would be more difficult to determine when to use something like 
dotps (dot product across two SIMD vectors), and various other 
instructions. Not to mention, non-x86 architectures.