CTFE and Static If Question

H. S. Teoh hsteoh at quickfur.ath.cx
Thu May 7 19:18:40 UTC 2020


On Thu, May 07, 2020 at 05:34:21PM +0200, ag0aep6g via Digitalmars-d-learn wrote:
[...]
> Compared with LDC and GDC, DMD has a poor optimizer,
[...]

DMD's optimizer is the whipping boy around here, but to be fair, it is
poor only when it comes to aggrssive inlining and optimizing loops.
Other than those two areas, it's actually a pretty decent optimizer.

And even within those two areas, it actually does do some degree of
optimization, it's not completely useless.  Its main fault is caused by
what I call the domino-effect of optimization: optimal code is often
arrived at by applying a chain of optimization steps, each of which
depends on (some of) the previous step(s). You can think of previous
optimization steps as a domino that knocks down subsequent dominoes
(enables further optimizations).  Conversely, if you miss an
optimization opportunity, it will also close the door to the subsequent
optimization steps. If the previous domino didn't fall, the subsequent
ones won't either.

One of the problems with DMD's optimizer is that the inliner gives up
too easily, and too early in the process; as a result, it often closes
the door to further optimizations.  Its loop optimizer is also rather,
shall we say, timid, and so combined with the equally timid inliner, it
all adds up to a lot of missed optimization opportunities.  And given
that loops are usually where the bottlenecks are, these missed
opportunities add up to a big performance hit at runtime.

Comparatively, LDC and GDC's optimizers are extremely aggressive with
loop optimizations; as a result, they are able to reach past local
minima in computation space and obtain sometimes pretty major
performance gains. LDC also especially seems very aggressive with
inlining, especially with -O2 and -O3; this ensures no rock is left
unturned among the optimization dominoes, thus enabling it to simplify a
lot of code that DMD gives up on too early in the process. Heavily
range-based code seems to especially get vastly different performance
characteristics here -- because range methods tend to be small functions
that, if inlined, lead to further optimizations with the adjacent ranges
in the UFCS pipeline. But if one method failed to be inlined, it blocks
all further optimizations down the chain, causing a chain of missed
opportunities and resulting in slow runtime code.


T

-- 
Дерево держится корнями, а человек - друзьями.


More information about the Digitalmars-d-learn mailing list