DConf 2013 Day 3 Talk 5: Effective SIMD for modern architectures by Manu Evans

Thu Jun 20 06:11:30 PDT 2013

On 20 June 2013 21:58, bearophile <bearophileHUGS at lycos.com> wrote:

> Andrei Alexandrescu:
>
>  http://youtube.com/watch?v=q_**39RnxtkgM<http://youtube.com/watch?v=q_39RnxtkgM>
>>
>
> Very nice.
>
> - - - - - - - - - - - - - - - - - - -
>
> Slide 3:
>
>  In practise, say we have iterative code like this:
>>
>> int data[100];
>>
>> for(int i = 0; i < data.length; ++i) {
>>   data[i] += 10; }
>>
>
> For code like that in D we have vector ops:
>
> int[100] data;
> data[] += 10;
>
>
> Regarding vector ops: currently they are written with handwritten asm that
> uses SIMD where possible. Once std.simd is in good shape I think the array
> ops can be rewritten (and completed in their missing parts) using a higher
> level style of coding.
>

I was trying to illustrate a process. Not so much a comment on D array
syntax.
The problem with auto-simd applied to array operations, is D doesn't assert
that arrays are aligned. Nor are they multiples of 'N' elements wide, which
means they lose the opportunity to make a lot of assumptions that make the
biggest performance difference.
They must be aligned, and multiples of N elements. By using explicit SIMD
types, you're forced to adhere to those rules as a programmer, and the
compiler can optimise properly.
You take on the responsibility to handle mis-alignment and stragglers as
the programmer, and perhaps make less conservative choices.

- - - - - - - - - - - - - - - - - - -
>
> Slide 22:
>
>  Comparisons:
>> Full suite of comparisons Can produce bit-masks, or boolean 'any'/'all'
>> logic.
>>
>
> Maybe a little of compiler support (for the syntax) will help here.
>

Well, each are valid comparisons in different situations. I'm not sure how
syntax could clearly select the one you want.

- - - - - - - - - - - - - - - - - - -
>
> Slide 26:
>
>  Always pass vectors by value.
>>
>
> Unfortunately it seems a bad idea to give a warning if you pass one of
> those by reference.
>

And I don't think it should. Passing by ref isn't 'wrong', you just
shouldn't do it if you care about performance.

- - - - - - - - - - - - - - - - - - -
>
> Slide 27:
>
>  3. Use ‘leaf’ functions where possible.
>>
>
> I am not sure how much good it is to enforce leaf functions with a @leaf
> annotation.
>

I don't think it would be useful. It should only be considered a general
rule when people are very specifically considering performance above all
else.
It's just a very important detail to be aware of when optimising your code,
particularly so when you're dealing with maths code (often involving simd).

- - - - - - - - - - - - - - - - - - -
>
> Slide 32:
>
>  Experiment with prefetching?
>>
>
> Are D intrinsics offering instructions to perform prefetching?
>

Well, GCC does at least. If you're worried about performance at this level,
you're probably already using GCC :)

- - - - - - - - - - - - - - - - - - -
>
> LDC2 is supports SIMD on Windows32 too.
>
> So for this code:
>
>
> void main() {
>     alias double2 = __vector(double[2]);
>     auto a = new double[200];
>     auto b = cast(double2[])a;
>     double2 tens = [10.0, 10.0];
>     b[] += tens;
> }
>
>
> LDC2 compiles it to:
>
>         movl    $200, 4(%esp)
>         movl    $__D11TypeInfo_Ad6__initZ, (%esp)
>         calll   __d_newarrayiT
>         movl    %edx, %esi
>         movl    %eax, (%esp)
>         movl    $16, 8(%esp)
>         movl    $8, 4(%esp)
>         calll   __d_array_cast_len
>         testl   %eax, %eax
>         je      LBB0_3
>         movapd  LCPI0_0, %xmm0
>         .align  16, 0x90
> LBB0_2:
>         movapd  (%esi), %xmm1
>         addpd   %xmm0, %xmm1
>         movapd  %xmm1, (%esi)
>         addl    $16, %esi
>         decl    %eax
>         jne     LBB0_2
> LBB0_3:
>         xorl    %eax, %eax
>         addl    $12, %esp
>         popl    %esi
>         ret
>
>
> It uses addpd that works with two doubles at the same time.
>

Sure... did I say this wasn't supported somewhere? Sorry if I gave that
impression.

- - - - - - - - - - - - - - - - - - -
>
> The Reddit thread contains a link to this page, a compiler for a C variant
> from Intel that's optimized for SIMD:
> http://ispc.github.io/
>
> Some of the syntax of ispc:
>
> - - - - - -
>
> The first of these statements is cif, indicating an if statement that is
> expected to be coherent. The usage of cif in code is just the same as if:
>
> cif (x < y) {
>     ...
> } else {
>     ...
> }
>
> cif provides a hint to the compiler that you expect that most of the
> executing SPMD programs will all have the same result for the if condition.
>
> Along similar lines, cfor, cdo, and cwhile check to see if all program
> instances are running at the start of each loop iteration; if so, they can
> run a specialized code path that has been optimized for the "all on"
> execution mask case.
>

This is interesting. I didn't know about this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-announce/attachments/20130620/4204394a/attachment-0001.html>