core.simd woes

Tue Aug 7 00:32:33 PDT 2012

On 6 August 2012 22:57, jerro <a at a.com> wrote:

> The intention was that std.simd would be flat C-style api, which would be
>> the lowest level required for practical and portable use.
>>
>
> Since LDC and GDC implement intrinsics with an API different from that
> used in DMD, there are actually two kinds of portability we need to worry
> about - portability across different compilers and portability across
> different architectures. std.simd solves both of those problems, which is
> great for many  use cases (for example when dealing with geometric
> vectors), but it doesn't help when you want to use architecture dependant
> functionality directly. In this case one would want to have an interface as
> close to the actual instructions as possible but uniform across compilers.
> I think we should define such an interface as functions and templates in
> core.simd, so you would have for example:
>
> float4 unpcklps(float4, float4);
> float4 shufps(int, int, int, int)(float4, float4);
>

I can see your reasoning, but I think that should be in core.sse, or
core.simd.sse personally. Or you'll end up with VMX, NEON, etc all blobbed
in one huge intrinsic wrapper file.
That said, almost all simd opcodes are directly accessible in std.simd.
There are relatively few obscure operations that don't have a representing
function.
The unpck/shuf example above for instance, they both effectively perform a
sort of swizzle, and both are accessible through swizzle!(). The swizzle
mask is analysed by the template, and it produces the best opcode to match
the pattern. Take a look at swizzle, it's bloody complicated to do that the
most efficient way on x86. Other architectures are not so much trouble ;)
So while you may argue that it might be simpler to use an opcode intrinsic
wrapper directly, the opcode is actually still directly accessible via
swizzle and an appropriate swizzle arrangement, which it might also be
argues is more readable to the end user, since the result of the opcode is
clearly written...

Then each compiler would implement this API in its own way. DMD would use
> __simd (1), gdc would use GCC builtins and LDC would use LLVM intrinsics
> and shufflevector. If we don't include something like that in core.simd,
> many applications will need to implement their own versions of it. Using
> this would also reduce the amount of code needed to implement std.simd
> (currently most of the std.simd only supports GDC and it's already pretty
> large). What do you think about adding such an API to core.simd?
>
> (1) Some way to support the rest of SSE instructions needs to be added to
> DMD, of course.
>

The reason I didn't write the DMD support yet is because it was incomplete,
and many opcodes weren't yet accessible, like shuf for instance... and I
just wasn't finished. Stopped to wait for DMD to be feature complete.
I'm not opposed to this idea, although I do have a concern that, because
there's no __forceinline in D (or macros), adding another layer of
abstraction will make maths code REALLY slow in unoptimised builds.
Can you suggest a method where these would be treated as C macros, and not
produce additional layers of function calls? I'm already unhappy that
std.simd produces redundant function calls.

<rant> please  please please can haz __forceinline! </rant>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120807/55b16872/attachment.html>