Support for gcc vector attributes, SIMD builtins

Tue Feb 1 10:32:57 PST 2011

Iain Buclaw Wrote:

> == Quote from Mike Farnsworth (mike.farnsworth at gmail.com)'s article
> > I built gdc from tip on Fedora 13 (x86-64) and started playing around
> > with creating a vector struct (x,y,z,w) to see what kind of optimization
> > the code generator did with it.  It was able to partially drop into SSE
> > registers and instructions, but not as well as I had hoped from writing
> > "regular" D code.
> > I poked through the builtins that get pulled into d-builtins.c /
> > d-builtins2.cc but I don't see anything that might be pulling in
> > definitions such as __builtin_ia32_* for SSE, for example.
> > How hard would it be to get some sort of vector attribute attached to a
> > type (or just plain indroduce v4sf, __m128, or something like that) and
> > get those SIMD builtins available?
> > For the curious, here are how they are defined in, for example,
> > xmmintrin.h for gcc:
> > typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
> > typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> > __artificial__))
> > _mm_add_ps (__m128 __A, __m128 __B)
> > {
> >   return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B);
> > }
> 
> Although GDC hashes out GCC builtins and attributes, most of it is very much
> incomplete. For example, a D version (for GDC) of the code above would be
> something like:
> 
> 
> import gcc.builtins;
> 
> pragma(set_attribute, __m128, vector_size(16), may_alias);
> pragma(set_attribute, __v4sf, vector_size(16));
> pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> 
> typedef float __m128;
> typedef float __v4sf;
> 
> __m128 _mm_add_ps (__m128 __A, __m128 __B)
> {
>     return cast(__m128) __builtin_ia32_addps (cast(__v4sf)__A, cast(__v4sf)__B);
> }
> 
> 
> 
> However, this doesn't work because
> 
> 1) There is no 128bit float type in DMDFE (can be put in though, even if it is
> just for internal use).
> 2) Vectors are not representable in DMDFE.
> 
> So __builtin_ia32_addps (and many other ia32 builtins) cannot be emitted to the D
> environment.

I figured this would be the case; the "typedef float whatever __attribute((vector_size(16)))" stuff is already weird, so I don't expect dmdfe to do the right thing with even similar syntax at all.

> Interestingly enough, this particular example actually ICEs the compiler. It
> appears that while *explicit* casting is done in the code, DMDFE actually
> *ignores* this, which is terrible on DMD's part...

Hah.  It's obvious dmdfe doesn't understand that the builtin's signature correctly, so I'll hold off on a bug report until I can figure out what kind of signature that builtin had registered with dmdfe.

> Saying that, workaround is to use array types.
> typedef float[4] __m128;
> typedef float[4] __v4sf;
> 
> 
> All the more reason to show you that pragma(attribute) is still very incomplete to
> use. Any ideas to improve it are welcome though. :)

In my (not very abundant) spare time, I'll poke around the attribute stuff to see if I can attach the vector_size(16) attribute to a float[4] array type.  I know the __builtin_ia32_addps function, for example, takes a v4sf (__m128 is just Intel's version that can change personalities at will; I feel no inclination to keep it around, and instead go with more strictly defined types and cast intrinsics).  If I can get that builtin to take a typedef'd float[4] without a cast, perhaps dmdfe will not drop any data and the codegen will happen properly.

Where do I look to see the attribute pragmas in gdc?  Where do I look to potentially change the signature that dmdfe sees for the __builtin_ia32_* functions?  If I can get a hand-coded signature to work, then we'll be in business.

-Mike