System programming in D (Was: The God Language)

Thu Dec 29 14:14:57 PST 2011

David Nadlinger Wrote:

> On 12/29/11 2:13 PM, a wrote:
> > void test(ref V a, ref V b)
> > {
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> > }
> >
> > [â€¦]
> >
> > The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.
> 
> Yes, this is indeed a problem, and as far as I'm aware, usually solved 
> in the gamedev world by using the (SSE) intrinsics your favorite C++ 
> compiler provides, instead of resorting to inline asm.
> 
> David

IIRC Walter doesn't want to add vector intrinsics, so it would be nice if the functions to do vector operations could be efficiently  written using inline assembly.  It would also be a more general solution than having intrinsics. Something like that is possible with gcc extended inline assembly. For example this: 

typedef float v4sf __attribute__((vector_size(16)));

void vadd(v4sf *a, v4sf *b)
{
    asm(
        "addps %1, %0" 
        : "=x" (*a) 
        : "x" (*b), "0" (*a)
        : );
}

void test(float * __restrict__ a, float * __restrict__ b)
{
    v4sf * va = (v4sf*) a;
    v4sf * vb = (v4sf*) b;
    vadd(va,vb);
    vadd(va,vb);
    vadd(va,vb);
    vadd(va,vb);
}

compiles to:

00000000004004c0 <test>:
  4004c0:       0f 28 0e                movaps (%rsi),%xmm1
  4004c3:       0f 28 07                movaps (%rdi),%xmm0
  4004c6:       0f 58 c1                addps  %xmm1,%xmm0
  4004c9:       0f 58 c1                addps  %xmm1,%xmm0
  4004cc:       0f 58 c1                addps  %xmm1,%xmm0
  4004cf:       0f 58 c1                addps  %xmm1,%xmm0
  4004d2:       0f 29 07                movaps %xmm0,(%rdi)

This should also be possible with GDC, but I couldn't figure out how to get something like __restrict__ (if you want to use vector types and gcc extended inline assembly with GDC, see http://www.digitalmars.com/d/archives/D/gnu/Support_for_gcc_vector_attributes_SIMD_builtins_3778.html and https://bitbucket.org/goshawk/gdc/wiki/UserDocumentation).