<div dir="ltr">On 20 June 2013 21:58, bearophile <span dir="ltr"><<a href="mailto:bearophileHUGS@lycos.com" target="_blank">bearophileHUGS@lycos.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Andrei Alexandrescu:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<a href="http://youtube.com/watch?v=q_39RnxtkgM" target="_blank">http://youtube.com/watch?v=q_<u></u>39RnxtkgM</a><br>

</blockquote>

<br>

Very nice.<br>

<br>

- - - - - - - - - - - - - - - - - - -<br>

<br>

Slide 3:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In practise, say we have iterative code like this:<br>

<br>

int data[100];<br>

<br>

for(int i = 0; i < data.length; ++i) {<br>

  data[i] += 10; }<br>

</blockquote>

<br>

For code like that in D we have vector ops:<br>

<br>

int[100] data;<br>

data[] += 10;<br>

<br>

<br>

Regarding vector ops: currently they are written with handwritten asm that uses SIMD where possible. Once std.simd is in good shape I think the array ops can be rewritten (and completed in their missing parts) using a higher level style of coding.<br>

</blockquote><div><br></div><div style>I was trying to illustrate a process. Not so much a comment on D array syntax.</div><div style>The problem with auto-simd applied to array operations, is D doesn't assert that arrays are aligned. Nor are they multiples of 'N' elements wide, which means they lose the opportunity to make a lot of assumptions that make the biggest performance difference.</div>

<div style>They must be aligned, and multiples of N elements. By using explicit SIMD types, you're forced to adhere to those rules as a programmer, and the compiler can optimise properly.</div><div style>You take on the responsibility to handle mis-alignment and stragglers as the programmer, and perhaps make less conservative choices.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- - - - - - - - - - - - - - - - - - -<br>

<br>

Slide 22:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Comparisons:<br>

Full suite of comparisons Can produce bit-masks, or boolean 'any'/'all' logic.<br>

</blockquote>

<br>

Maybe a little of compiler support (for the syntax) will help here.<br></blockquote><div><br></div><div style>Well, each are valid comparisons in different situations. I'm not sure how syntax could clearly select the one you want.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- - - - - - - - - - - - - - - - - - -<br>

<br>

Slide 26:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Always pass vectors by value.<br>

</blockquote>

<br>

Unfortunately it seems a bad idea to give a warning if you pass one of those by reference.<br></blockquote><div><br></div><div style>And I don't think it should. Passing by ref isn't 'wrong', you just shouldn't do it if you care about performance.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- - - - - - - - - - - - - - - - - - -<br>

<br>

Slide 27:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3. Use ‘leaf’ functions where possible.<br>

</blockquote>

<br>

I am not sure how much good it is to enforce leaf functions with a @leaf annotation.<br></blockquote><div><br></div><div style>I don't think it would be useful. It should only be considered a general rule when people are very specifically considering performance above all else.</div>

<div style>It's just a very important detail to be aware of when optimising your code, particularly so when you're dealing with maths code (often involving simd).</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


- - - - - - - - - - - - - - - - - - -<br>

<br>

Slide 32:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Experiment with prefetching?<br>

</blockquote>

<br>

Are D intrinsics offering instructions to perform prefetching?<br></blockquote><div><br></div><div style>Well, GCC does at least. If you're worried about performance at this level, you're probably already using GCC :)</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- - - - - - - - - - - - - - - - - - -<br>

<br>

LDC2 is supports SIMD on Windows32 too.<br>

<br>

So for this code:<br>

<br>

<br>

void main() {<br>

    alias double2 = __vector(double[2]);<br>

    auto a = new double[200];<br>

    auto b = cast(double2[])a;<br>

    double2 tens = [10.0, 10.0];<br>

    b[] += tens;<br>

}<br>

<br>

<br>

LDC2 compiles it to:<br>

<br>

        movl    $200, 4(%esp)<br>

        movl    $__D11TypeInfo_Ad6__initZ, (%esp)<br>

        calll   __d_newarrayiT<br>

        movl    %edx, %esi<br>

        movl    %eax, (%esp)<br>

        movl    $16, 8(%esp)<br>

        movl    $8, 4(%esp)<br>

        calll   __d_array_cast_len<br>

        testl   %eax, %eax<br>

        je      LBB0_3<br>

        movapd  LCPI0_0, %xmm0<br>

        .align  16, 0x90<br>

LBB0_2:<br>

        movapd  (%esi), %xmm1<br>

        addpd   %xmm0, %xmm1<br>

        movapd  %xmm1, (%esi)<br>

        addl    $16, %esi<br>

        decl    %eax<br>

        jne     LBB0_2<br>

LBB0_3:<br>

        xorl    %eax, %eax<br>

        addl    $12, %esp<br>

        popl    %esi<br>

        ret<br>

<br>

<br>

It uses addpd that works with two doubles at the same time.<br></blockquote><div><br></div><div style>Sure... did I say this wasn't supported somewhere? Sorry if I gave that impression.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


- - - - - - - - - - - - - - - - - - -<br>

<br>

The Reddit thread contains a link to this page, a compiler for a C variant from Intel that's optimized for SIMD:<br>

<a href="http://ispc.github.io/" target="_blank">http://ispc.github.io/</a><br>

<br>

Some of the syntax of ispc:<br>

<br>

- - - - - -<br>

<br>

The first of these statements is cif, indicating an if statement that is expected to be coherent. The usage of cif in code is just the same as if:<br>

<br>

cif (x < y) {<br>

    ...<br>

} else {<br>

    ...<br>

}<br>

<br>

cif provides a hint to the compiler that you expect that most of the executing SPMD programs will all have the same result for the if condition.<br>

<br>

Along similar lines, cfor, cdo, and cwhile check to see if all program instances are running at the start of each loop iteration; if so, they can run a specialized code path that has been optimized for the "all on" execution mask case.<br>

</blockquote><div><br></div><div style>This is interesting. I didn't know about this.</div></div></div></div>