System programming in D (Was: The God Language)
a
a at a.com
Thu Dec 29 14:14:57 PST 2011
David Nadlinger Wrote:
> On 12/29/11 2:13 PM, a wrote:
> > void test(ref V a, ref V b)
> > {
> > asm
> > {
> > movaps XMM0, a;
> > addps XMM0, b;
> > movaps a, XMM0;
> > }
> > asm
> > {
> > movaps XMM0, a;
> > addps XMM0, b;
> > movaps a, XMM0;
> > }
> > }
> >
> > [â¦]
> >
> > The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.
>
> Yes, this is indeed a problem, and as far as I'm aware, usually solved
> in the gamedev world by using the (SSE) intrinsics your favorite C++
> compiler provides, instead of resorting to inline asm.
>
> David
IIRC Walter doesn't want to add vector intrinsics, so it would be nice if the functions to do vector operations could be efficiently written using inline assembly. It would also be a more general solution than having intrinsics. Something like that is possible with gcc extended inline assembly. For example this:
typedef float v4sf __attribute__((vector_size(16)));
void vadd(v4sf *a, v4sf *b)
{
asm(
"addps %1, %0"
: "=x" (*a)
: "x" (*b), "0" (*a)
: );
}
void test(float * __restrict__ a, float * __restrict__ b)
{
v4sf * va = (v4sf*) a;
v4sf * vb = (v4sf*) b;
vadd(va,vb);
vadd(va,vb);
vadd(va,vb);
vadd(va,vb);
}
compiles to:
00000000004004c0 <test>:
4004c0: 0f 28 0e movaps (%rsi),%xmm1
4004c3: 0f 28 07 movaps (%rdi),%xmm0
4004c6: 0f 58 c1 addps %xmm1,%xmm0
4004c9: 0f 58 c1 addps %xmm1,%xmm0
4004cc: 0f 58 c1 addps %xmm1,%xmm0
4004cf: 0f 58 c1 addps %xmm1,%xmm0
4004d2: 0f 29 07 movaps %xmm0,(%rdi)
This should also be possible with GDC, but I couldn't figure out how to get something like __restrict__ (if you want to use vector types and gcc extended inline assembly with GDC, see http://www.digitalmars.com/d/archives/D/gnu/Support_for_gcc_vector_attributes_SIMD_builtins_3778.html and https://bitbucket.org/goshawk/gdc/wiki/UserDocumentation).
More information about the Digitalmars-d
mailing list