seeding the pot for 2.0 features [small vectors]

Mikola Lysenko mclysenk at mtu.edu
Mon Jan 29 06:33:56 PST 2007


Joel C. Salomon wrote:
> As I understand it, D’s inline assembler would be the tool to use for 
> this in a library implementation.  I don’t think the complex types use 
> SIMD, so the vectors can be the only things using those registers.
> 
>

I can tell you right now that this won't work.  I have tried using the 
inline assembler with a vector class and the speedup was at barely 
noticeable.  You can see the results here:  http://assertfalse.com

Here are just a few of the things that become a problem for a library 
implementation:

1. Function calls

	Inline assmeber can not be inlined.  Period.  The compiler has to think 
of inline assembler as a sort of black box, which takes inputs one way 
and returns them another way.  It can not poke around in there and 
change your hand-tuned opcodes in order to pass arguments in arguments 
more efficiently.  Nor can it change the way you allocate registers so 
you don't accidentally trash the local frame.  It can't manipulate where 
you put the result, such that it can be used immediately by the next 
block of code.  Therefore any asm vector class will have a lot of 
wasteful function calls which quickly add up:


a = b + c * d;

becomes:

a = b.opAdd(c.opMul(d));


2. Register allocation

	This point is related to 1.  Most SIMD architectures have many 
registers, and a good compiler can easily use that to optimize stuff 
like parameter passing and function returns.  This is totally impossible 
for a library to do, since it has no knowledge of the contents of any 
registers as it executes.

3. Data alignment

	This is a big problem for libraries.  Most vector architectures require 
properly aligned data.  D only provides facilities for aligning 
attributes within a struct, not according to any type of global system 
alignment.  To fix this in D, we will need the compiler's help.  This 
will allow us to pack vectors in a function such that they are properly 
aligned within each local call frame.

-Mik



More information about the Digitalmars-d mailing list