Force inline

Mon Feb 20 09:12:59 PST 2017

On Mon, Feb 20, 2017 at 05:16:15AM -0800, Jonathan M Davis via Digitalmars-d-learn wrote:
[...]
> Regardless, if performance is your #1 concern, then I would suggest
> that you compile with ldc and not dmd.
[...]

+1.  If you are concerned about performance enough to worry whether the
compiler will inline something, it's time to use gdc or ldc.  Dmd's
inliner is rudimentary at best, and its optimizer, while serviceable, is
not up to par with gdc or ldc's optimizers.  If you want top
performance, use gdc / ldc. 

IME gdc -O3 consistently produces code that runs about 20-30% faster
than code produced by dmd -O (even with -inline).  Sometimes I've seen
performance gains of up to 40-50%. This is especially likely when your
code consists of deep call trees involving small(ish) functions: I've
looked at the assembly output before and it seems that dmd's inliner
just gives up too easily, thus missing the opportunities for further
reductions and further inlining.  Even after discounting the inliner,
though, I find that gdc is simply better at loop optimization than dmd,
such as hoisting, strength reduction, unrolling, etc..  So if your code
involves complex loops, expect gdc -O3 to produce better code than dmd.

Well, "better" may be debatable, but certainly gdc is far more
aggressive at optimizing loops (and optimizing in general) than dmd, and
I find in the cases I've looked at that aggressive optimization often
leads to further optimization opportunities, whereas if the optimizer is
too conservative, opportunities are missed that may lead to other
opportunities, so the resulting code can end up being vastly different
in performance.

Having said all that, though, have you used a profiler to determine
whether or not your performance bottleneck is really at the function in
question?  I find that 90% of the time what I truly believe should be
inlined actually doesn't make much difference; the bottleneck is usually
somewhere else that I didn't expect.  I used to spend lots of time
trying to hyper-optimize everything, only to discover later that 90% of
my efforts have been wasted on gaining a meager 1% of performance,
whereas if I had just used a profiler in the first place, I would have
gotten a 50% performance improvement with only 10% of the effort.

T

-- 
Tech-savvy: euphemism for nerdy.