A little Py Vs C++

Walter Bright newshound2 at digitalmars.com
Fri Nov 2 11:02:20 PDT 2012


On 11/2/2012 3:50 AM, Jens Mueller wrote:
 > Okay. For me they look the same. Can you elaborate, please? Assume I
 > want to add two float vectors which is common in both games and
 > scientific computing. The only difference is in games their length is
 > usually 3 or 4 whereas in scientific computing they are of arbitrary
 > length. Why do I need instrinsics to support the game setting?

Another excellent question.

Most languages have taken the "auto-vectorization" approach of reverse 
engineering loops to turn them into high level constructs, and then compiling 
the code into special SIMD instructions.

How to do this is explained in detail in the (rare) book "The Software 
Vectorization Handbook" by Bik, which I fortunately was able to obtain a copy of.

This struck me as a terrible approach, however. It just seemed stupid to try to 
teach the compiler to reverse engineer low level code into high level code. A 
better design would be to start with high level code. Hence, the appearance of D 
vector operations.

The trouble with D vector operations, however, is that they are too general 
purpose. The SIMD instructions are very quirky, and it's easy to unwittingly and 
silently cause the compiler to generate absolutely terribly slow code. The 
reasons for that are the alignment requirements, coupled with the SIMD 
instructions not being orthogonal - some operations work for some types and not 
for others, in a way that is unintuitive unless you're carefully reading the 
SIMD specs.

Just saying align(16) isn't good enough, as the vector ops work on slices and 
those slices aren't always aligned. So each one has to check alignment at 
runtime, which is murder on performance.

If a particular vector op for a particular type has no SIMD support, then the 
compiler has to generate workaround code. This can also have terrible 
performance consequences.

So the user writes vector code, benchmarks it, finds zero improvement, and the 
reasons why will be elusive to anyone but an expert SIMD programmer.

(Auto-vectorizing technology has similar issues, pretty much meaning you won't 
get fast code out of it unless you've got a habit of examining the assembler 
output and tweaking as necessary.)

Enter Manu, who has a lot of experience making SIMD work for games. His proposal 
was:

1. Have native SIMD types. This will guarantee alignment, and will guarantee a 
compile time error for SIMD types that are not supported by the CPU.

2. Have the compiler issue an error for SIMD operations that are not supported 
by the CPU, rather than silently generating inefficient workaround code.

3. There are all kinds of weird but highly useful SIMD instructions that don't 
have a straightforward representation in high level code, such as saturated 
arithmetic. Manu's answer was to expose these instructions via intrinsics, so 
the user can string them together, be sure that they will generate real SIMD 
instructions, while the compiler can deal with register allocation.

This approach works, is inlineable, generates code as good as hand-built 
assembler, and is useable by regular programmers.

I won't say there aren't better approaches, but this one we know works.



More information about the Digitalmars-d mailing list