dmd codegen improvements

tsbockman via Digitalmars-d digitalmars-d at puremagic.com
Wed Aug 19 15:15:44 PDT 2015


On Tuesday, 18 August 2015 at 10:45:49 UTC, Walter Bright wrote:
> Martin ran some benchmarks recently that showed that ddmd 
> compiled with dmd was about 30% slower than when compiled with 
> gdc/ldc. This seems to be fairly typical.
>
> I'm interested in ways to reduce that gap.
>
> There are 3 broad kinds of optimizations that compilers do:
>
> 1. source translations like rewriting x*2 into x<<1, and 
> function inlining
>
> 2. instruction selection patterns like should one generate:
>
>     SETC AL
>     MOVZ EAX,AL
>
> or:
>     SBB EAX
>     NEG EAX
>
> 3. data flow analysis optimizations like constant propagation, 
> dead code elimination, register allocation, loop invariants, 
> etc.
>
> Modern compilers (including dmd) do all three.
>
> So if you're comparing code generated by dmd/gdc/ldc, and 
> notice something that dmd could do better at (1, 2 or 3), 
> please let me know. Often this sort of thing is low hanging 
> fruit that is fairly easily inserted into the back end.
>
> For example, recently I improved the usage of the SETcc 
> instructions.
>
> https://github.com/D-Programming-Language/dmd/pull/4901
> https://github.com/D-Programming-Language/dmd/pull/4904
>
> A while back I improved usage of BT instructions, the way 
> switch statements were implemented, and fixed integer divide by 
> a constant with multiply by its reciprocal.

I lack the assembly language skills to determine the cause(s) 
myself, but my 
[CheckedInt](https://github.com/tsbockman/CheckedInt) benchmark 
runs about 10x slower when compiled with DMD rather than GDC. I'm 
sure there's some low-hanging fruit in there somewhere...

Note that while it's far from being a minimal test case, the 
runtime code is nowhere near as complicated as it might appear at 
first - the vast majority of the complexity in the code is 
compile-time logic. I could produce a similar example without 
most of the compile-time obfuscation, if requested.

Also note that the speed difference has nothing to do with the 
use of core.checkedint intrinsics, as it was there before those 
were implemented in GDC.


More information about the Digitalmars-d mailing list