SIMD support...

Fri Jan 6 19:27:08 PST 2012

On Sat, 07 Jan 2012 01:06:21 +0100, Walter Bright  
<newshound2 at digitalmars.com> wrote:

> On 1/6/2012 1:43 PM, Manu wrote:
>> There is actually. To the compiler, the intrinsic is a normal function,  
>> with
>> some hook in the code generator to produce the appropriate opcode when  
>> it's
>> performing actual code generation.
>> On most compilers, the inline asm on the other hand, is unknown to the  
>> compiler,
>> the optimiser can't do much anymore, because it doesn't know what the  
>> inline asm
>> has done, and the code generator just goes and pastes your asm code  
>> inline where
>> you told it to. It doesn't know if you've written to aliased variables,  
>> called
>> functions, etc.. it can no longer safely rearrange code around the  
>> inline asm
>> block.. which means it's not free to pipeline the code efficiently.
>
> And, in fact, the compiler should not try to optimize inline assembler.  
> The IA is there so that the programmer can hand tweak things without the  
> compiler defeating his attempts.
>
> For example, suppose the compiler schedules instructions for processor  
> X. The programmer writes inline asm to schedule for Y, because the  
> compiler doesn't specifically support Y. The compiler goes ahead and  
> reschedules it for X.
>
> Arggh!

Yes, but that's not what I meant.

Consider

__v128 a = load(1), b = loadB(2);
__v128 c = add(a, b);
__v128 d = add(a, b);

A valid optimization could be.

__v128 b = load(2);
__v128 a = load(1);
__v128 tmp = add(a, b);
__v128 d = tmp;
__v128 c = tmp;

__v128 load(int v) pure
{
     __v128 res;
     asm (res, v)
     {
         MOVD res, v;
         SHUF res, 0x0000;
     }
     return res;
}

__v128 add(__v128 a, __v128 b) pure
{
     __v128 res = a;
     asm (res, b)
     {
         ADD res, b;
     }
     return res;
}

The compiler might drop evaluation of
d and just use the comsub of c.
He might also evaluate d before c.
The important point is to mark those functions as having no-sideeffect,
which can be checked if instructions are classified.
Thus the compiler can do all kind of optimizations on expression level.

After inlining it would look like this.

__v128 b;
asm (b) { MOV b, 2; }
__v128 a;
asm (a) { MOV a, 1; }
__v128 tmp;
asm (a, b, tmp) { MOV tmp, a; ADD tmp, b; }
__v128 c;
asm (c, tmp) { MOV c, tmp; }
__v128 d;
asm (d, tmp) { MOV d, tmp; }

Then he will do the usual register assignment except that
variables must be assigned a register for asm blocks they
are used in.

This is effectively achieves the same as writing this with intrinsics.
It also greatly improves the composition of inline asm.

>
> What dmd does do with the inline assembler is it keeps track of which  
> registers are read/written, so that effective register allocation can be  
> done for the non-asm code.

Which is why the compiler should be the one to allocate pseudo-registers.