Very simple SIMD programming

bearophile bearophileHUGS at lycos.com
Wed Oct 24 15:00:00 PDT 2012


Manu:

> The compiler would have to do some serious magic to optimise 
> that;
> flattening both sides of the if into parallel expressions, and 
> then applying the mask to combine...

I think it's a small amount of magic.

The simple features shown in that paper are fully focused on SIMD 
programming, so they aren't introducing things clearly not 
efficient.


> I'm personally not in favour of SIMD constructs that are 
> anything less than
> optimal (but I appreciate I'm probably in the minority here).
>
>
> (The simple benchmarks of the paper show a 5-15% performance 
> loss compared
>> to handwritten SIMD code.)
>>
>
> Right, as I suspected.

15% is a very small performance loss, if for the programmer the 
alternative is writing scalar code, that is 2 or 3 times slower 
:-)

The SIMD programmers that can't stand a 1% loss of performance 
use the intrinsics manually (or write in asm) and they ignore all 
other things.

A much larger population of system programmers wish to use modern 
CPUs efficiently, but they don't have time (or skill, this means 
their programs are too much often buggy) for assembly-level 
programming. Currently they use smart numerical C++ libraries, 
use modern Fortran versions, and/or write C/C++ scalar code (or 
Fortran), add "restrict" annotations, and take a look at the 
produced asm hoping the modern compiler back-ends will vectorize 
it. This is not good enough, and it's far from a 15% loss.

This paper shows a third way, making such kind of programming 
simpler and approachable for a wider audience, with a small 
performance loss compared to handwritten code. This is what 
language designers do since 60+ years :-)

Bye,
bearophile


More information about the Digitalmars-d mailing list