dmd codegen improvements

Tue Aug 18 06:01:29 PDT 2015

On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote:
> Martin ran some benchmarks recently that showed that ddmd 
> compiled with dmd was about 30% slower than when compiled with 
> gdc/ldc. This seems to be fairly typical.
>
> I'm interested in ways to reduce that gap.
>
> There are 3 broad kinds of optimizations that compilers do:
>
> 1. source translations like rewriting x*2 into x<<1, and 
> function inlining
>
> 2. instruction selection patterns like should one generate:
>
>     SETC AL
>     MOVZ EAX,AL
>
> or:
>     SBB EAX
>     NEG EAX
>
> 3. data flow analysis optimizations like constant propagation, 
> dead code elimination, register allocation, loop invariants, 
> etc.
>
> Modern compilers (including dmd) do all three.
>
> So if you're comparing code generated by dmd/gdc/ldc, and 
> notice something that dmd could do better at (1, 2 or 3), 
> please let me know. Often this sort of thing is low hanging 
> fruit that is fairly easily inserted into the back end.
>
> For example, recently I improved the usage of the SETcc 
> instructions.
>
> https://github.com/D-Programming-Language/dmd/pull/4901
> https://github.com/D-Programming-Language/dmd/pull/4904
>
> A while back I improved usage of BT instructions, the way 
> switch statements were implemented, and fixed integer divide by 
> a constant with multiply by its reciprocal.

I've often looked at the assembly output of ICC.

One thing that was striking to me is that it by and large it 
doesn't use PUSH, POP, and SETcc. Actually I don't remember such 
an instruction being emitted by it.

And indeed using PUSH/POP/SETcc in assembly were often slower 
than the alternative. Which is _way_ different that the old x86 
where each of these things would gain speed.

Instead of PUSH/POP it would spill all registers to an RBP-based 
location the (perhaps taking advantage of the register renamer?).

---------------

That said: I entirely agree with Vladimir about the codegen risk. 
DMD will always be used anyway because it compiles faster.