auto vectorization observations

Bruce Carneal bcarneal at gmail.com
Wed Jun 8 18:41:44 UTC 2022


Auto vectorization (autovec) can yield significant performance 
improvements but this back end technology may struggle with some 
very simple forms, leaving you with lackluster performance.  When 
that happens in performance critical code D's __vector types will 
come in handy.

ldc and gdc differ in their autovec capabilities with gdc coming 
out ahead in at least one important area: dealing with 
conditionals.

As an example, gdc is able to vectorize the following for both 
ARM SVE and x86-SIMD architectures while ldc, per my godbolt 
testing at least, can not.

```d
alias T = ubyte; // 16 wide (128 bit HW) to 64 wide (512 bit HW)
alias CT = const(T);

void choose(size_t n, CT* src, T threshold, CT* a, CT* b, T* dst)
{
     foreach(i; 0 .. n)
         dst[i] = src[i] < threshold ? a[i] : b[i];
}
```

You can handle conditionals manually in the __vector world but 
it's tedious and error prone so kudos Iain and the gcc crew.

Additional observations wrt D and auto vectorization, good and 
bad, are welcome.



More information about the Digitalmars-d mailing list