Taking pipeline processing to the next level

Wed Sep 7 05:26:44 PDT 2016

On Wednesday, 7 September 2016 at 11:53:00 UTC, Marc Schütz wrote:
> Would a `vectorize` range adapter be feasible that prepares the 
> input to make it SIMD compatible? That is, force alignment, 
> process left-over elements at the end, etc.? As far as I 
> understand, the problems with auto vectorization stem from a 
> difficulty of compilers to recognize vectorizing opportunities, 
> and (as Manu described) from incompatible semantics of scalar 
> and vector types that the compiler needs to preserve. But if 
> that hypothetical `vectorize` helper forces the input data into 
> one of a handful of well-known formats and types, wouldn't it 
> be possible to make the compilers recognize those (especially 
> if they are accompanied by suitable pragma or other compiler 
> hints)?
>

Contrary to popular belief, alignment is not a showstopper of 
SIMD code. Both Intel and ARM processors have instructions to 
access data from unaligned addresses. And on Intel processors, 
there is not even any speed penalty for using them on aligned 
addresses.  Which means you can either forget it (on Intel) or 
just check the data alignment before you start and choose an 
optimal specialization of the main loop.

However regarding auto vectorization, I'm with Manu. I won't put 
my bet on auto vectorization because I have never seen any 
non-trivial auto vectorized code that comes even close to hand 
tuned SIMD code. The compiler always have to play conservatively. 
  The compiler has no idea that you are only using 10 bits of each 
16bit components in a vector. It can't even help you shuffle 
RGBARGBARGBARGBA into RRRRGGGGBBBBAAAA.  The best we can do is to 
create something that makes writing SIMD kernels 
easy/reusable/composable.