__restrict, architecture intrinsics vs asm, consoles, and other stuff

Iain Buclaw ibuclaw at ubuntu.com
Sat Sep 24 05:37:34 PDT 2011


== Quote from Manu Evans (turkeyman at gmail.com)'s article
> > > How can I do this in a nice way in D? I'm long sick of writing
> > > unsightly vector classes in C++, but fortunately using vendor
> > > specific compiler intrinsics usually leads to decent code
> > > generation. I can currently imagine an equally ugly (possibly worse)
> > > hardware vector library in D, if it's even possible. But perhaps
> > > I've missed something here?
> > Your C++ vector code should be amenable to translation to D, so that effort of
> > yours isn't lost, except that it'd have to be in inline asm rather than
intrinsics.
> But sadly, in that case, it wouldn't work. Without an intrinsic hardware vector
type, there's
> no way to pass vectors to functions in registers, and also, using explicit asm,
you tend to
> end up with endless unnecessary loads and stores, and potentially a lot of redundant
> shuffling/permutation. This will differ radically between architectures too.
> I think I read in another post too that functions containing inline asm will not
be inlined?
> How does the D compiler go at optimising code around inline asm blocks? Most
compilers have a
> lot of trouble optimising around inline asm blocks, and many don't even attempt
to do so...
> How does GDC compare to DMD? Does it do a good job?
> I really need to take the weekend and do a lot of experiments I think.

GDC is just the same as DMD (same runtime library implementation for vector array
operations).


You can define vector types in the language through use of GCC's attribute though
(is a pragma in GDC), then use a union to interface between it and the
corresponding static array.  It's deliberately UGLY and PRONE to you hitting lots
of brick walls if you don't handle them in a very specific way though. :~)

Stock example:

pragma(attribute, vector_size())
  typedef float __v4sf_t

union __v4sf {
  float[4] f;
  __v4sf_t v;
}


__v4sf a = {[1,2,3,4]}
       b = {[1,2,3,4]}
       c;

c.v = a.v + b.v;
assert(c.f == [2,4,6,8]);


The assignment compiles down to ~5 instructions:
movaps -0x88(%ebp),%xmm1
movaps -0x78(%ebp),%xmm0
addps  %xmm1,%xmm0
movaps %xmm0,-0x68(%ebp)
flds   -0x68(%ebp)

And is far quicker than c[] = a[] + b[] due to it being inlined, and not an
external library call.

Regards
Iain


More information about the Digitalmars-d mailing list