core.simd woes

Mon Aug 6 08:15:20 PDT 2012

On 5 August 2012 06:33, F i L <witte2008 at gmail.com> wrote:

> core.simd vectors are limited in a couple of annoying ways. First, if I
> define:
>
>     @property pure nothrow
>     {
>         auto x(float4 v) { return v.ptr[0]; }
>         auto y(float4 v) { return v.ptr[1]; }
>         auto z(float4 v) { return v.ptr[2]; }
>         auto w(float4 v) { return v.ptr[3]; }
>
>         void x(ref float4 v, float val) { v.ptr[0] = val; }
>         void y(ref float4 v, float val) { v.ptr[1] = val; }
>         void z(ref float4 v, float val) { v.ptr[2] = val; }
>         void w(ref float4 v, float val) { v.ptr[3] = val; }
>     }
>
> Then use it like:
>
>     float4 a, b;
>
>     a.x = a.x + b.x;
>
> it's actually somehow faster than directly using:
>
>     a.ptr[0] += b.ptr[0];
>
> However, notice that I can't use '+=' in the first case, because 'x' isn't
> an lvalue. That's really annoying. Moreover, I can't assign a vector to
> anything other than a array of constant expressions. Which means I have to
> make functions just to assign vectors in a convenient way.
>
>     float rand = ...;
>     float4 vec = [rand, 1, 1, 1]; // ERROR: expected constant
>
>
> Now, none of this would be an issue at all if I could wrap core.simd
> vectors into custom structs... but doing that complete negates their
> performance gain (I'm guessing because of boxing?). It's a different
> between 2-10x speed improvements using float4 directly (depending on CPU),
> and only a few mil secs improvement when wrapping float4 in a struct.
>
> So, it's not my ideal situation, but I wouldn't mind at all having to use
> core.simd vector types directly, and moving things like
> dot/cross/normalize/etc to external functions, but if that's the case then
> I would _really_ like some basic usability features added to the vector
> types.
>
> Mono C#'s Mono.Simd.Vector4f, etc, types have these basic features, and
> working with them is much nicer than using D's core.simd vectors.
>

I think core.simd is only designed for the lowest level of access to the
SIMD hardware. I started writing std.simd some time back; it is mostly
finished in a fork, but there are some bugs/missing features in D's SIMD
support preventing me from finishing/releasing it. (incomplete dmd
implementation, missing intrinsics, no SIMD literals, can't do unit
testing, etc)

The intention was that std.simd would be flat C-style api, which would be
the lowest level required for practical and portable use.
It's almost done, and it should make it a lot easier for people to build
their own SIMD libraries on top. It supplies most useful linear algebraic
operations, and implements them as efficiently as possible for other
architectures than just SSE.
Take a look: https://github.com/TurkeyMan/phobos/blob/master/std/simd.d

On a side note, your example where you're performing a scalar add within a
vector; this is bad, don't ever do this.
SSE (ie, x86) is the most tolerant architecture in this regard, but it's
VERY bad SIMD design. You should never perform any component-wise
arithmetic when working with SIMD; It's absolutely not portable.
Basically, a good rule of thumb is, if the keyword 'float' appears anywhere
that interacts with your SIMD code, you are likely to see worse performance
than just using float[4] on most architectures.
Better to factor your code to eliminate any scalar work, and make sure
'scalars' are broadcast across all 4 components and continue doing 4d
operations.

Instead of: @property pure nothrow float x(float4 v) { return v.ptr[0]; }
Better to use: @property pure nothrow float4 x(float4 v) { return
swizzle!"xxxx"(v); }
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120806/00115831/attachment.html>