By ref and by pointer kills performance.
Richard (Rikki) Andrew Cattermole
richard at cattermole.co.nz
Wed Feb 14 05:30:54 UTC 2024
I'll give an example for something I wrote up in a reply for a Reddit post:
Using ldc I was able to get this code to do 8.8 million vectors in
around 1 second.
```d
module strategy;
int run(byte[] a, byte[] b) {
int sum = 0;
for(size_t i; i < a.length; i += 64) {
static foreach(j; 0 .. 64) {
sum += cast(short)a[i + j] * cast(short)b[i + j];
}
}
return sum;
}
```
To achieve the same thing using Go, they had to write assembly (see
DotVNNI example).
https://sourcegraph.com/blog/slow-to-simd
LDC is able to get to almost the maximum a CPU can do, with what looks
to be almost naive looking code. DMD cannot compete with this, nor
should it aim to.
Here is what actual naive code looks like for this problem:
```d
module strategy;
int run(byte[] a, byte[] b) {
assert(a.length == b.length);
int sum = 0;
foreach(i; 0 .. a.length) {
sum += cast(short)a[i] * cast(short)b[i];
}
return sum;
}
```
7 million vectors per second.
They had to write assembly to get that speed. I got it, without doing
anything special...
More information about the Digitalmars-d
mailing list