Support for gcc vector attributes, SIMD builtins

Tue Feb 1 11:00:52 PST 2011

Iain Buclaw Wrote:

> == Quote from Jerry Quinn (jlquinn at optonline.net)'s article
> > Iain Buclaw Wrote:
> > > == Quote from Mike Farnsworth (mike.farnsworth at gmail.com)'s article
> > > > I built gdc from tip on Fedora 13 (x86-64) and started playing around
> > > > with creating a vector struct (x,y,z,w) to see what kind of optimization
> > > > the code generator did with it.  It was able to partially drop into SSE
> > > > registers and instructions, but not as well as I had hoped from writing
> > > > "regular" D code.
> > > > I poked through the builtins that get pulled into d-builtins.c /
> > > > d-builtins2.cc but I don't see anything that might be pulling in
> > > > definitions such as __builtin_ia32_* for SSE, for example.
> > > > How hard would it be to get some sort of vector attribute attached to a
> > > > type (or just plain indroduce v4sf, __m128, or something like that) and
> > > > get those SIMD builtins available?
> > >
> > > Saying that, workaround is to use array types.
> > > typedef float[4] __m128;
> > > typedef float[4] __v4sf;
> > >
> > >
> > > All the more reason to show you that pragma(attribute) is still very incomplete to
> > > use. Any ideas to improve it are welcome though. :)
> > The workaround actually looks like a cleaner way to define types for vector
> intrinsics.  How hard would it be to export vector intrinsics so the API expects
> float[4], for example?
> 
> I haven't given it much thought on how internal representation could be, but I'd
> lean on using unions in D code for usage in the language. As its probably most
> portable.
> 
> For example, one of the older 'hello vectors' I know of:
> 
> import std.c.stdio;
> 
> pragma(set_attribute, __v4sf, vector_size(16));
> typedef float __v4sf;
> 
> union f4vector
> {
>     __v4sf v;
>     float[4] f;
> }
> 
> int main()
> {
>     f4vector a, b, c;
> 
>     a.f = [1, 2, 3, 4];
>     b.f = [5, 6, 7, 8];
> 
>     c.v = a.v + b.v;
>     printf("%f, %f, %f, %f\n", c.f[0], c.f[1], c.f[2], c.f[3]);
> 
>     return 0;
> }
> 
> 
> Compile: gdc -c -g -msse hellovector.d
> Dump Object: objdump -dS hellovector.o'
> 
> And the output of the SIMD operation speaks for itself:
> 
> c.v = a.v + b.v;
>   xorps  %xmm1,%xmm1
>   movlps %gs:0x0,%xmm1
>   movhps %gs:0x8,%xmm1
>   xorps  %xmm0,%xmm0
>   movlps %gs:0x0,%xmm0
>   movhps %gs:0x8,%xmm0
>   addps  %xmm1,%xmm0
>   movlps %xmm0,%gs:0x0
>   movhps %xmm0,%gs:0x8
> 
> 
> Regards.
> Iain

Huh, that's actually pretty promising.  Hooray for gcc's vector ops. =)

I suppose I should still try to beat up on the __builtin_ia32_* stuff to make sure that can work, but if the codegen already gets us that far then that's pretty good.  With a little -O3 it might even clean up some of the extraneous stuff, especially with a sequence of vector operations.  The intrinsics on  will get us some of the more interesting things like movemasks, shuffles, vector compares, etc.

As long as the union doesn't cause a bunch of load/store deadweight in the generated code, this might work nicely.  However, I'll bet dmdfe doesn't undertand that __v4sf isn't really just a float, though...so at some point that will need to be fixed so that there is not accidental slicing and invalid array/structure sizes, etc.

-Mike