Implementation of gcc SIMD builtins

Iain Buclaw ibuclaw at ubuntu.com
Thu Feb 10 05:08:49 PST 2011


== Quote from Mike Farnsworth (mike.farnsworth at gmail.com)'s article
> Sorry to start a new thread on this, but I didn't want it to get lost in
> the middle of the previous comments.  I have the start of a working
> implementation in gdc giving access to the __builtin_ia32_* functions.
> The way I did it so far manages to compile down to very tight SSE code.
>  That's the good part.
> The bad part is that there are a couple of limitations that engender
> slightly ugly code at the moment.
> Issue 1:
> In order for the compiler to actually recognize the builtins when you
> call them, I had to define a set of custom types that represent gcc's
> types with the vector_size attribute that get passed into the builtins.
> I couldn't use float[4] for V4SF as I (and Iaian) had hoped, as I can't
> set the alignment, and it automatically generates calls to _d_array_init
> and _d_array_copy and such, rather than instead just staying in SSE
> registers.  Let's just say the code was very non-optimal.

I didn't hope for anything, I'm not the crazy one using them. =)

> Instead, I create a struct declaration in gcc.builtins for all of the
> types expected by those builtins, and I name them to match what they
> would nominally contain: __v4sf would have 4 floats, __v32qi would have
> 32 bytes, __v2df would have 2 doubles, and so forth.  Each struct has
> 16-byte alignment and the correct size.  But here's the rub: they have
> no fields in them.  I tried my darndest to add VarDeclarations to them,
> but the fact that the actual gcc tree type wasn't a struct, it would
> just ICE the compiler when instantiating any of those structs.
> I'd like to fix this, so that you can literally access the contents of
> the struct like it had a float[4], or a double[2], or a byte[32], or
> whatever it should actually have; or instead it should give you an
> overloaded [] operator for direct indexing.
> Issue 2:
> The builtin structs I generate are *not* recognized by the frontend as
> having support for +, -, *, /, etc like the gcc vector_size types
> automatically do in C and C++.  I might be able to add those and have
> them contain code to drop into the builtins.  For now, you *must* use
> the builtin functions to perform operations on these types.  I'm
> obviously aiming to use the builtin functions, myself (for now).

Actually, more I think about it, the more I feel a user-defined union would be
better to scale the shortcomings of gcc attribute support in gdc. And trying to
use whatever builtins gcc has to offer won't get you anywhere far anytime soon.

There's one or two ICEs when using arithmetic operations (+,-,/,*,=) for typedef'd
types with vector attributes assigned to them. This has mostly been fixed in my
local tree (with hopefully kind error message for invalid ops too), which will be
pushed soon after the next dmd release merge.

> Quick example:
> ///// File VectorsMain.d /////
> import gcc.builtins;
> import mmintrins;
> import std.stdio;
> void main()
> {
>     __v4sf bv1;
>     setvelem(bv1, 0, 1.0f);
>     setvelem(bv1, 1, 2.0f);
>     setvelem(bv1, 2, 3.0f);
>     setvelem(bv1, 3, 0.0f);
>     __v4sf bv2;
>     setvelem(bv2, 0, 1.0f);
>     setvelem(bv2, 1, 1.0f);
>     setvelem(bv2, 2, 1.0f);
>     setvelem(bv2, 3, 0.0f);
>     __v4sf bv3 = _mm_add_ps(bv1, bv2);
>     std.stdio.writefln("Result: (%s, %s, %s, %s)",
>                        velem!float(bv3, 0),
>                        velem!float(bv3, 1),
>                        velem!float(bv3, 2),
>                        velem!float(bv3, 3));
> }
> ///// File mmintrins.d /////
> module mmintrins;
> import gcc.builtins;
> T velem(T, VT)(VT vector, uint elem)
> {
>     return (cast(T*) &vector)[elem];
> }
> void setvelem(T, VT)(ref VT vector, uint elem, T value)
> {
>     (cast(T*) &vector)[elem] = value;
> }
> //pragma(set_attribute, _mm_add_ps, always_inline, artificial);
> T _mm_add_ps(T)(const(T) v1, const(T) v2)
> {
>     return __builtin_ia32_addps(v1, v2);
> }
> ///// End example /////
> Note a few things: I made _mm_add_ps templated on vector type (I'll
> constrain it eventually to appropriate types), and this solves a couple
> of problems: cross-module inlining works as the other module gets the
> whole definition, and you can technically addps types other than v4sf.
> Note the velem and setvelem methods are just to add a pretty face on the
> fact that the data of the struct is hidden, with no fields to access it.
>  More checks are needed (at least in debug mode), and there will be some
> other handy things like _mm_set1_ps and _mm_set_ps to make rapid setup
> of vectors easier.  I'll admit that this part is a bit ugly, but it
> works, and it generates excellent code.  I compared the actual assembly
> generated to my own C++ code with the same intrinsics, and so far the D
> side is keeping up.
> Please don't collectively throw up when you see this...fast vector ops
> are kindof a big deal for me, so be gentle. =)  What do you all think?
> -Mike

I think I'm gonna throw up... :~)


More information about the D.gnu mailing list