core.simd woes
jerro
a at a.com
Tue Oct 9 10:46:39 PDT 2012
On Tuesday, 9 October 2012 at 16:59:58 UTC, Jacob Carlborg wrote:
> On 2012-10-09 16:52, Simen Kjaeraas wrote:
>
>> Nope, like:
>>
>> module std.simd;
>>
>> version(Linux64) {
>> public import std.internal.simd_linux64;
>> }
>>
>>
>> Then all std.internal.simd_* modules have the same public
>> interface, and
>> only the version that fits /your/ platform will be included.
>
> Exactly, what he said.
I'm guessing the platform in this case would be the CPU
architecture, since that determines what SIMD instructions are
available, not the OS. But anyway, this does not address the
problem Manu was talking about. The problem is that the API for
the intrisics for the same architecture is not consistent across
compilers. So for example, if you wanted to generate the
instruction "movaps XMM1, XMM2, 0x88" (this extracts all even
elements from two vectors), you would need to write:
version(GNU)
{
return __builtin_ia32_shufps(a, b, 0x88);
}
else version(LDC)
{
return shufflevector(a, b, 0, 2, 4, 6);
}
else version(DMD)
{
// can't do that in DMD yet, but the way to do it will
probably be different from the way it is done in LDC and GDC
}
What Manu meant with having std.simd.sse and std.simd.neon was to
have modules that would provide access to the platform dependent
instructions that would be portable across compilers. So for the
shufps instruction above you would have something like this ins
std.simd.sse:
float4 shufps(int i0, int i1, int i2, int i3)(float4 a, float4
b){ ... }
std.simd currently takes care of cases when the code can be
written in a cross platform way. But when you need to use
platform specific instructions directly, std.simd doesn't
currently help you, while std.simd.sse, std.simd.neon and others
would. What Manu is worried about is that having instructions
wrapped in another level of functions would hurt performance. It
certainly would slow things down in debug builds (and IIRC he has
written in his previous posts that he does care about that). I
don't think it would make much of a difference when compiled with
optimizations turned on, at least not with LDC and GDC.
More information about the Digitalmars-d
mailing list