By ref and by pointer kills performance.

Richard (Rikki) Andrew Cattermole richard at cattermole.co.nz
Wed Feb 14 05:30:54 UTC 2024


I'll give an example for something I wrote up in a reply for a Reddit post:

Using ldc I was able to get this code to do 8.8 million vectors in 
around 1 second.

```d
module strategy;

int run(byte[] a, byte[] b) {
     int sum = 0;

     for(size_t i; i < a.length; i += 64) {
         static foreach(j; 0 .. 64) {
             sum += cast(short)a[i + j] * cast(short)b[i + j];
         }
     }

     return sum;
}
```

To achieve the same thing using Go, they had to write assembly (see 
DotVNNI example).

https://sourcegraph.com/blog/slow-to-simd

LDC is able to get to almost the maximum a CPU can do, with what looks 
to be almost naive looking code. DMD cannot compete with this, nor 
should it aim to.

Here is what actual naive code looks like for this problem:

```d
module strategy;

int run(byte[] a, byte[] b) {
     assert(a.length == b.length);

     int sum = 0;

     foreach(i; 0 .. a.length) {
         sum += cast(short)a[i] * cast(short)b[i];
     }

     return sum;
}
```

7 million vectors per second.

They had to write assembly to get that speed. I got it, without doing 
anything special...


More information about the Digitalmars-d mailing list