Support for gcc vector attributes, SIMD builtins

Tue Feb 1 10:38:30 PST 2011

== Quote from Jerry Quinn (jlquinn at optonline.net)'s article
> Iain Buclaw Wrote:
> > == Quote from Mike Farnsworth (mike.farnsworth at gmail.com)'s article
> > > I built gdc from tip on Fedora 13 (x86-64) and started playing around
> > > with creating a vector struct (x,y,z,w) to see what kind of optimization
> > > the code generator did with it.  It was able to partially drop into SSE
> > > registers and instructions, but not as well as I had hoped from writing
> > > "regular" D code.
> > > I poked through the builtins that get pulled into d-builtins.c /
> > > d-builtins2.cc but I don't see anything that might be pulling in
> > > definitions such as __builtin_ia32_* for SSE, for example.
> > > How hard would it be to get some sort of vector attribute attached to a
> > > type (or just plain indroduce v4sf, __m128, or something like that) and
> > > get those SIMD builtins available?
> >
> > Saying that, workaround is to use array types.
> > typedef float[4] __m128;
> > typedef float[4] __v4sf;
> >
> >
> > All the more reason to show you that pragma(attribute) is still very incomplete to
> > use. Any ideas to improve it are welcome though. :)
> The workaround actually looks like a cleaner way to define types for vector
intrinsics.  How hard would it be to export vector intrinsics so the API expects
float[4], for example?

I haven't given it much thought on how internal representation could be, but I'd
lean on using unions in D code for usage in the language. As its probably most
portable.

For example, one of the older 'hello vectors' I know of:

import std.c.stdio;

pragma(set_attribute, __v4sf, vector_size(16));
typedef float __v4sf;

union f4vector
{
    __v4sf v;
    float[4] f;
}

int main()
{
    f4vector a, b, c;

    a.f = [1, 2, 3, 4];
    b.f = [5, 6, 7, 8];

    c.v = a.v + b.v;
    printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);

    return 0;
}

Compile: gdc -c -g -msse hellovector.d
Dump Object: objdump -dS hellovector.o'

And the output of the SIMD operation speaks for itself:

c.v = a.v + b.v;
  xorps  %xmm1,%xmm1
  movlps %gs:0x0,%xmm1
  movhps %gs:0x8,%xmm1
  xorps  %xmm0,%xmm0
  movlps %gs:0x0,%xmm0
  movhps %gs:0x8,%xmm0
  addps  %xmm1,%xmm0
  movlps %xmm0,%gs:0x0
  movhps %xmm0,%gs:0x8

Regards.
Iain