auto vectorization notes

Sat Mar 28 05:21:14 UTC 2020

On Monday, 23 March 2020 at 18:52:16 UTC, Bruce Carneal wrote:
> When speeds are equivalent, or very close, I usually prefer 
> auto vectorized code to explicit SIMD/__vector code as it's 
> easier to read.  (on the downside you have to guard against 
> compiler code-gen performance regressions)
>
> One oddity I've noticed is that I sometimes need to use 
> pragma(inline, *false*) in order to get ldc to "do the right 
> thing". Apparently the compiler sees the costs/benefits 
> differently in the standalone context.
>
> More widely known techniques that have gotten people over the 
> serial/SIMD hump include:
>  1) simplified indexing relationships
>  2) known count inner loops (chunkify)
>  3) static foreach blocks (manual inlining that the compiler 
> "gathers")
>
> I'd be interested to hear from others regarding their auto 
> vectorization and __vector experiences.  What has worked and 
> what hasn't worked in your performance sensitive dlang code?

auto vectorization is bad because you never know if your code 
will get vectorized next time you make some change to it and 
recompile.
Just use : https://ispc.github.io/