core.simd woes

Wed Oct 10 01:09:26 PDT 2012

On 9 October 2012 20:46, jerro <a at a.com> wrote:

> On Tuesday, 9 October 2012 at 16:59:58 UTC, Jacob Carlborg wrote:
>
>> On 2012-10-09 16:52, Simen Kjaeraas wrote:
>>
>>  Nope, like:
>>>
>>> module std.simd;
>>>
>>> version(Linux64) {
>>>     public import std.internal.simd_linux64;
>>> }
>>>
>>>
>>> Then all std.internal.simd_* modules have the same public interface, and
>>> only the version that fits /your/ platform will be included.
>>>
>>
>> Exactly, what he said.
>>
>
> I'm guessing the platform in this case would be the CPU architecture,
> since that determines what SIMD instructions are available, not the OS. But
> anyway, this does not address the problem Manu was talking about. The
> problem is that the API for the intrisics for the same architecture is not
> consistent across compilers. So for example, if you wanted to generate the
> instruction "movaps XMM1, XMM2, 0x88" (this extracts all even elements from
> two vectors), you would need to write:
>
> version(GNU)
> {
>     return __builtin_ia32_shufps(a, b, 0x88);
> }
> else version(LDC)
> {
>     return shufflevector(a, b, 0, 2, 4, 6);
> }
> else version(DMD)
> {
>     // can't do that in DMD yet, but the way to do it will probably be
> different from the way it is done in LDC and GDC
> }
>
> What Manu meant with having std.simd.sse and std.simd.neon was to have
> modules that would provide access to the platform dependent instructions
> that would be portable across compilers. So for the shufps instruction
> above you would have something like this ins std.simd.sse:
>
> float4 shufps(int i0, int i1, int i2, int i3)(float4 a, float4 b){ ... }
>
> std.simd currently takes care of cases when the code can be written in a
> cross platform way. But when you need to use platform specific instructions
> directly, std.simd doesn't currently help you, while std.simd.sse,
> std.simd.neon and others would. What Manu is worried about is that having
> instructions wrapped in another level of functions would hurt performance.
> It certainly would slow things down in debug builds (and IIRC he has
> written in his previous posts that he does care about that). I don't think
> it would make much of a difference when compiled with optimizations turned
> on, at least not with LDC and GDC.
>

Perfect! You saved me writing anything at all ;)

I do indeed care about debug builds, but one interesting possibility that I
discussed with Walter last week was a #pragma inline statement, which may
force-enable inlining even in debug. I'm not sure how that would translate
to GDC/LDC, and that's an important consideration. I'd also like to prove
that the code-gen does work well with 2 or 3 levels of inlining, and that
the optimiser is still able to perform sensible code reordering in the
target context.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20121010/92cafb37/attachment.html>