Very simple SIMD programming
bearophile
bearophileHUGS at lycos.com
Wed Oct 24 15:00:00 PDT 2012
Manu:
> The compiler would have to do some serious magic to optimise
> that;
> flattening both sides of the if into parallel expressions, and
> then applying the mask to combine...
I think it's a small amount of magic.
The simple features shown in that paper are fully focused on SIMD
programming, so they aren't introducing things clearly not
efficient.
> I'm personally not in favour of SIMD constructs that are
> anything less than
> optimal (but I appreciate I'm probably in the minority here).
>
>
> (The simple benchmarks of the paper show a 5-15% performance
> loss compared
>> to handwritten SIMD code.)
>>
>
> Right, as I suspected.
15% is a very small performance loss, if for the programmer the
alternative is writing scalar code, that is 2 or 3 times slower
:-)
The SIMD programmers that can't stand a 1% loss of performance
use the intrinsics manually (or write in asm) and they ignore all
other things.
A much larger population of system programmers wish to use modern
CPUs efficiently, but they don't have time (or skill, this means
their programs are too much often buggy) for assembly-level
programming. Currently they use smart numerical C++ libraries,
use modern Fortran versions, and/or write C/C++ scalar code (or
Fortran), add "restrict" annotations, and take a look at the
produced asm hoping the modern compiler back-ends will vectorize
it. This is not good enough, and it's far from a 15% loss.
This paper shows a third way, making such kind of programming
simpler and approachable for a wider audience, with a small
performance loss compared to handwritten code. This is what
language designers do since 60+ years :-)
Bye,
bearophile
More information about the Digitalmars-d
mailing list