DConf 2013 Day 3 Talk 5: Effective SIMD for modern architectures by Manu Evans

Thu Jun 20 07:03:39 PDT 2013

Manu:

> They must be aligned, and multiples of N elements.

The D GC currently allocates them 16-bytes aligned (but if you 
slice the array you can lose some alignment). On some new CPUs 
the penalty for misalignment is small.

You often have "n" values, where n is variable. If n is large 
enough and you are using D vector ops, the handling of the head 
and tail doesn't waste too much time. If you have very few values 
it's much better to use the SIMD code.

> Well, each are valid comparisons in different situations. I'm 
> not sure how syntax could clearly select the one you want.

Maybe later we'll look for some syntax sugar for this.

>> Are D intrinsics offering instructions to perform prefetching?
>
> Well, GCC does at least. If you're worried about performance at 
> this level, you're probably already using GCC :)

I think D SIMD programmers will expect something functionally 
like __builtin_prefetch to be available in D too:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fprefetch-3396

Thank you,
bye,
bearophile