DMD backend quality (Was: Re: DIP 1031--Deprecate Brace-Style Struct Initializers--Community Review Round 1 Discussion)

Tue Feb 18 21:52:18 UTC 2020

On Tue, Feb 18, 2020 at 09:22:18PM +0000, matheus via Digitalmars-d wrote:
[...]
> Just to be clear, I am NOT diminishing any work, I just wasn't aware
> of these differences, because 20 to 30 or maybe 40% on runtime is
> something that you can't just ignore.

It depends on your use case.  YMMV, as they say.

IME, the difference is most pronounced for CPU-intensive code where you
have a lot of nested constructs, esp. in range-based code with extensive
use of UFCS chains.

Some years ago I analysed the assembly output to understand why this is
so.  According to my observations, one big factor is that the DMD
inliner is rather anemic: it gives up at the slightest complication,
thus missing out on what's commonly a domino-effect of optimizations:
inlining the innermost function call opens up new optimization
opportunities which causes the next outer level to be inlineable, etc.,
until most of the call stack has been inlined. But stop somewhere in
between and you miss out the rest of the entire call chain's worth of
optimizations.  In range-based code, most of the method calls are very
small, so quite often a long chain of nested calls can be reduced to
just a few inlined instructions, so this is where LDC's very aggressive
inliner really shines.  In such cases, you can easily see 20-30%
almost-guaranteed performance difference, and sometimes I've seen even
up to 40%.  I just ran a quick test again, as I posted in another reply,
and again I see a >30% performance boost just from using ldc2 instead of
dmd.

The LLVM optimizer has been known to optimize entire call chains into
the equivalent of a `return constValue;` by executing LLVM bytecode at
compile-time to optimize away an entire subtree of function calls into a
constant value. DMD's inliner, in comparison, balks at the sight of
things as trivial as writing:

	auto myFunc(...) {
		if (cond) return result1;
		return result2;
	}

vs.

	auto myFunc(...) {
		if (cond) return result1;
		else return result2;
	}

so it tends to give up long before the code has been transformed into a
state where it can be inlined one level further up the call chain.
Thus, an entire domino-chain of optimizations is missed, and you end up
with several nested function calls where it could have been reduced to
just a few inlined instructions. When this happens in the inner loop, a
30% performance drop is pretty much expected.

> I think that maybe is possible to balance things, like developing with
> DMD and generate final code with GDC/LDC to get faster runtime, well
> maybe some problems will happen mixing compilers and final result, but
> this is something that I will considerate for now on.
[...]

As somebody pointed out, using dmd for development and ldc2 for release
build is definitely a viable approach.  Recent LDC releases have been
tracking DMD releases very closely, so excepting rare corner cases and
bugs, functionality-wise the two compilers should pretty much be on par.

And now that I've learned that ldmd2 doesn't preclude passing
LDC-specific flags on the command-line, I might actually just adopt this
approach.  (The trouble with different command-line syntaxes is that
compilers are not drop-in replacements for each other, which makes
supporting multiple compilers in the same build script a pain. Certainly
possible, and not even hard, but nonetheless a pain to write and
maintain. With ldmd2 I can just standardize on DMD command-line syntax
and have everything Just Work(tm). Best of both worlds.)

T

-- 
People demand freedom of speech to make up for the freedom of thought which they avoid. -- Soren Aabye Kierkegaard (1813-1855)