performance issues with SIMD function

Bogdan contact at szabobogdan.com
Sat Nov 4 11:06:02 UTC 2023


On Friday, 3 November 2023 at 15:32:08 UTC, Sergey wrote:
> On Friday, 3 November 2023 at 15:11:31 UTC, Bogdan wrote:
>> Hi everyone,
>>
>> I was playing around with the intel-intrinsics library, trying 
>> to improve the speed of a simple area function. I could not 
>> see any performance improvements from the non-SIMD 
>> implementation. The SIMD version is a little bit slower even 
>> with LDC2 and --o3. Can anyone help me to understand what I am 
>> missing?
>>
>> Thanks!
>> Bogdan
>
> In your SIMD algorithm has not so many gain from using SIMD. 
> The length of the loop is the same.
> Also probably compiler applying some optimizations in regular 
> versions, that doing almost the same.

I think it was from the way I was running the benchmark:

```d
   ////
   auto begin = Clock.currTime;
   foreach (i; 0..100_000) {
     res1 = areaMeters(polygon);
   }
   writeln("No SIMD ", Clock.currTime - begin);

   ////
   begin = Clock.currTime;
   foreach (i; 0..100_000) {
     res2 = areaMetersSimd2(polygon);
   }
   writeln("SIMD    ", Clock.currTime - begin);

```

gives me:
```
   No SIMD 1 sec, 80 ms, 765 μs, and 1 hnsec
   SIMD    1 sec, 120 ms, 765 μs, and 1 hnsec
```


```d
   ////
   auto begin = Clock.currTime;
   res1 = areaMeters(polygon);
   writeln("No SIMD ", Clock.currTime - begin);

   ////
   begin = Clock.currTime;
   res2 = areaMetersSimd2(polygon);
   writeln("SIMD    ", Clock.currTime - begin);

```


gives me:
```
   No SIMD 19 μs and 3 hnsecs
   SIMD    16 μs and 8 hnsecs
```






More information about the Digitalmars-d-learn mailing list