Taking pipeline processing to the next level

Wed Sep 7 04:53:00 PDT 2016

On Wednesday, 7 September 2016 at 10:31:13 UTC, finalpatch wrote:
> I think the problem here is two fold.
>
> First question, how do we combine pipeline stages with minimal 
> overhead
>
> I think the key to this problem is reliable *forceinline*
>
> for example, a pipeline like this
>
> input.map!(x=>x.f1().f2().f3().store(output));
>
> if we could make sure f1(), f2(), f3(), store(), and map() 
> itself are all inlined, then we end up with a single loop with 
> no function calls and the compiler is free to perform cross 
> function optimizations. This is about as good as you can get.  
> Unfortunately at the moment I hear it's difficult to make sure 
> D functions get inlined.
>

If the compiler is unable to inline (or wrongly decides it is too 
costly), I'd consider this a compiler bug. Of course, sometimes 
workarounds like `pragma(inline, true)` or `@forceinline` might 
be needed from time to time in practice, but they shouldn't 
influence the design of the pipeline interface.

> Second question, how do we combine SIMD pipeline stages with 
> minimal overhead
>
> Besides reliable inlining, we also need some template code to 
> repeat stages until their strides match. This requires details 
> about each stage's logical unit size, input/output type and 
> size at compile time. I can't think of what the interface of 
> this would look like but the current map!() is likely 
> insufficient to support this.

Would a `vectorize` range adapter be feasible that prepares the 
input to make it SIMD compatible? That is, force alignment, 
process left-over elements at the end, etc.? As far as I 
understand, the problems with auto vectorization stem from a 
difficulty of compilers to recognize vectorizing opportunities, 
and (as Manu described) from incompatible semantics of scalar and 
vector types that the compiler needs to preserve. But if that 
hypothetical `vectorize` helper forces the input data into one of 
a handful of well-known formats and types, wouldn't it be 
possible to make the compilers recognize those (especially if 
they are accompanied by suitable pragma or other compiler hints)?

>
> I still don't believe auto-select between scalar or vector 
> paths would be a very useful feature. Normally I would only 
> consider SIMD solution when I know in advance that this is a 
> performance hotspot. When the amount of data is small I simply 
> don't care about performance and would just choose whatever 
> simplest way to do it, like map!(), because the performance 
> impact is not noticeable and definitely not worth the increased 
> complexity.

In the above scenario, you can add `.vectorize` to the pipeline 
to enable vectorizing wherever you need it.