Taking pipeline processing to the next level
Marc Schütz via Digitalmars-d
digitalmars-d at puremagic.com
Wed Sep 7 04:53:00 PDT 2016
On Wednesday, 7 September 2016 at 10:31:13 UTC, finalpatch wrote:
> I think the problem here is two fold.
>
> First question, how do we combine pipeline stages with minimal
> overhead
>
> I think the key to this problem is reliable *forceinline*
>
> for example, a pipeline like this
>
> input.map!(x=>x.f1().f2().f3().store(output));
>
> if we could make sure f1(), f2(), f3(), store(), and map()
> itself are all inlined, then we end up with a single loop with
> no function calls and the compiler is free to perform cross
> function optimizations. This is about as good as you can get.
> Unfortunately at the moment I hear it's difficult to make sure
> D functions get inlined.
>
If the compiler is unable to inline (or wrongly decides it is too
costly), I'd consider this a compiler bug. Of course, sometimes
workarounds like `pragma(inline, true)` or `@forceinline` might
be needed from time to time in practice, but they shouldn't
influence the design of the pipeline interface.
> Second question, how do we combine SIMD pipeline stages with
> minimal overhead
>
> Besides reliable inlining, we also need some template code to
> repeat stages until their strides match. This requires details
> about each stage's logical unit size, input/output type and
> size at compile time. I can't think of what the interface of
> this would look like but the current map!() is likely
> insufficient to support this.
Would a `vectorize` range adapter be feasible that prepares the
input to make it SIMD compatible? That is, force alignment,
process left-over elements at the end, etc.? As far as I
understand, the problems with auto vectorization stem from a
difficulty of compilers to recognize vectorizing opportunities,
and (as Manu described) from incompatible semantics of scalar and
vector types that the compiler needs to preserve. But if that
hypothetical `vectorize` helper forces the input data into one of
a handful of well-known formats and types, wouldn't it be
possible to make the compilers recognize those (especially if
they are accompanied by suitable pragma or other compiler hints)?
>
> I still don't believe auto-select between scalar or vector
> paths would be a very useful feature. Normally I would only
> consider SIMD solution when I know in advance that this is a
> performance hotspot. When the amount of data is small I simply
> don't care about performance and would just choose whatever
> simplest way to do it, like map!(), because the performance
> impact is not noticeable and definitely not worth the increased
> complexity.
In the above scenario, you can add `.vectorize` to the pipeline
to enable vectorizing wherever you need it.
More information about the Digitalmars-d
mailing list