Very simple SIMD programming

Wed Oct 24 16:01:09 PDT 2012

On 24 October 2012 23:46, Manu <turkeyman at gmail.com> wrote:
> On 25 October 2012 01:00, bearophile <bearophileHUGS at lycos.com> wrote:
>>
>> Manu:
>>
>>
>>> The compiler would have to do some serious magic to optimise that;
>>> flattening both sides of the if into parallel expressions, and then
>>> applying the mask to combine...
>>
>>
>> I think it's a small amount of magic.
>>
>> The simple features shown in that paper are fully focused on SIMD
>> programming, so they aren't introducing things clearly not efficient.
>>
>>
>>> I'm personally not in favour of SIMD constructs that are anything less
>>> than
>>> optimal (but I appreciate I'm probably in the minority here).
>>>
>>>
>>> (The simple benchmarks of the paper show a 5-15% performance loss
>>> compared
>>>>
>>>> to handwritten SIMD code.)
>>>>
>>>
>>> Right, as I suspected.
>>
>>
>> 15% is a very small performance loss, if for the programmer the
>> alternative is writing scalar code, that is 2 or 3 times slower :-)
>>
>> The SIMD programmers that can't stand a 1% loss of performance use the
>> intrinsics manually (or write in asm) and they ignore all other things.
>>
>> A much larger population of system programmers wish to use modern CPUs
>> efficiently, but they don't have time (or skill, this means their programs
>> are too much often buggy) for assembly-level programming. Currently they use
>> smart numerical C++ libraries, use modern Fortran versions, and/or write
>> C/C++ scalar code (or Fortran), add "restrict" annotations, and take a look
>> at the produced asm hoping the modern compiler back-ends will vectorize it.
>> This is not good enough, and it's far from a 15% loss.
>>
>> This paper shows a third way, making such kind of programming simpler and
>> approachable for a wider audience, with a small performance loss compared to
>> handwritten code. This is what language designers do since 60+ years :-)
>
>
> I don't disagree with you, it is fairly cool!
> I can't can't imagine D adopting those sort of language features any time
> soon, but it's probably possible.
> I guess the keys are defining the bool vector concept, and some tech to
> flatten both sides of a vector if statement, but that's far from simple...
> Particularly so if someone puts some unrelated code in those if blocks.
> Chances are it offers too much freedom that wouldn't be well used or
> understood by the average programmer, and that still leaves you in a similar
> land of only being particularly worthwhile in the hands of a fairly
> advanced/competent user.
> The main error that most people make is thinking SIMD code is faster by
> nature. Truth is, in the hands of someone who doesn't know precisely what
> they're doing, SIMD code is almost always slower.
> There are some cool new expressions offered here, fairly convenient
> (although easy[er?] to write in other ways too), but I don't think it would
> likely change that fundamental premise for the average programmer beyond
> some very simple parallel constructs that the compiler can easily get right.
> I'd certainly love to see it, but is it realistic that someone would take
> the time to do all of that any time soon when benefits are controversial? It
> may even open the possibility for un-skilled people to write far worse code.
>
> Let's consider your example above for instance, I would rewrite (given
> existing syntax):
>
> // vector length of context = 1; current_mask = T
> int4 v = [0,3,4,1];
> int4 w = 3; // [3,3,3,3] via broadcast
> uint4 m = maskLess(v, w); // [T,F,F,T] (T == ones, F == zeroes)
> v += int4(1); // [1,4,5,2]
>
> // the if block is trivially rewritten:
> int4 trueSide = v + int4(2);
> int4 falseSize = v + int4(3);
> v = select(m, trueSide, falseSide); // [3,7,8,4]
>
>

This should work....

int4 trueSide = v + 2;
int4 falseSide = v + 3;
....


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';