Taking pipeline processing to the next level
finalpatch via Digitalmars-d
digitalmars-d at puremagic.com
Wed Sep 7 05:26:44 PDT 2016
On Wednesday, 7 September 2016 at 11:53:00 UTC, Marc Schütz wrote:
> Would a `vectorize` range adapter be feasible that prepares the
> input to make it SIMD compatible? That is, force alignment,
> process left-over elements at the end, etc.? As far as I
> understand, the problems with auto vectorization stem from a
> difficulty of compilers to recognize vectorizing opportunities,
> and (as Manu described) from incompatible semantics of scalar
> and vector types that the compiler needs to preserve. But if
> that hypothetical `vectorize` helper forces the input data into
> one of a handful of well-known formats and types, wouldn't it
> be possible to make the compilers recognize those (especially
> if they are accompanied by suitable pragma or other compiler
> hints)?
>
Contrary to popular belief, alignment is not a showstopper of
SIMD code. Both Intel and ARM processors have instructions to
access data from unaligned addresses. And on Intel processors,
there is not even any speed penalty for using them on aligned
addresses. Which means you can either forget it (on Intel) or
just check the data alignment before you start and choose an
optimal specialization of the main loop.
However regarding auto vectorization, I'm with Manu. I won't put
my bet on auto vectorization because I have never seen any
non-trivial auto vectorized code that comes even close to hand
tuned SIMD code. The compiler always have to play conservatively.
The compiler has no idea that you are only using 10 bits of each
16bit components in a vector. It can't even help you shuffle
RGBARGBARGBARGBA into RRRRGGGGBBBBAAAA. The best we can do is to
create something that makes writing SIMD kernels
easy/reusable/composable.
More information about the Digitalmars-d
mailing list