SIMD support...
Martin Nowak
dawg at dawgfoto.de
Fri Jan 6 19:27:08 PST 2012
On Sat, 07 Jan 2012 01:06:21 +0100, Walter Bright
<newshound2 at digitalmars.com> wrote:
> On 1/6/2012 1:43 PM, Manu wrote:
>> There is actually. To the compiler, the intrinsic is a normal function,
>> with
>> some hook in the code generator to produce the appropriate opcode when
>> it's
>> performing actual code generation.
>> On most compilers, the inline asm on the other hand, is unknown to the
>> compiler,
>> the optimiser can't do much anymore, because it doesn't know what the
>> inline asm
>> has done, and the code generator just goes and pastes your asm code
>> inline where
>> you told it to. It doesn't know if you've written to aliased variables,
>> called
>> functions, etc.. it can no longer safely rearrange code around the
>> inline asm
>> block.. which means it's not free to pipeline the code efficiently.
>
> And, in fact, the compiler should not try to optimize inline assembler.
> The IA is there so that the programmer can hand tweak things without the
> compiler defeating his attempts.
>
> For example, suppose the compiler schedules instructions for processor
> X. The programmer writes inline asm to schedule for Y, because the
> compiler doesn't specifically support Y. The compiler goes ahead and
> reschedules it for X.
>
> Arggh!
Yes, but that's not what I meant.
Consider
__v128 a = load(1), b = loadB(2);
__v128 c = add(a, b);
__v128 d = add(a, b);
A valid optimization could be.
__v128 b = load(2);
__v128 a = load(1);
__v128 tmp = add(a, b);
__v128 d = tmp;
__v128 c = tmp;
__v128 load(int v) pure
{
__v128 res;
asm (res, v)
{
MOVD res, v;
SHUF res, 0x0000;
}
return res;
}
__v128 add(__v128 a, __v128 b) pure
{
__v128 res = a;
asm (res, b)
{
ADD res, b;
}
return res;
}
The compiler might drop evaluation of
d and just use the comsub of c.
He might also evaluate d before c.
The important point is to mark those functions as having no-sideeffect,
which can be checked if instructions are classified.
Thus the compiler can do all kind of optimizations on expression level.
After inlining it would look like this.
__v128 b;
asm (b) { MOV b, 2; }
__v128 a;
asm (a) { MOV a, 1; }
__v128 tmp;
asm (a, b, tmp) { MOV tmp, a; ADD tmp, b; }
__v128 c;
asm (c, tmp) { MOV c, tmp; }
__v128 d;
asm (d, tmp) { MOV d, tmp; }
Then he will do the usual register assignment except that
variables must be assigned a register for asm blocks they
are used in.
This is effectively achieves the same as writing this with intrinsics.
It also greatly improves the composition of inline asm.
>
> What dmd does do with the inline assembler is it keeps track of which
> registers are read/written, so that effective register allocation can be
> done for the non-asm code.
Which is why the compiler should be the one to allocate pseudo-registers.
More information about the Digitalmars-d
mailing list