inlining...

Manu turkeyman at gmail.com
Fri Mar 14 04:04:22 PDT 2014


On 14 March 2014 18:03, John Colvin <john.loughran.colvin at gmail.com> wrote:

> As much as I like the idea:
>
> Something always tells me this is the compilers job... What clever
> reasoning are you applying that the compiler's inliner can't? It seems like
> a different situation to say SIMD code, where correctly structuring loops
> can require a lot of gymnastics that the compiler can't or won't (floating
> point conformance) do. The inlining decision seems easily automatable in
> comparison.
>
> I understand that unoptimised builds for debugging are a problem, but a
> sensible compiler let's you hand pick your optimisation passes.
>
> In short: why are compilers not good enough at this that the programmer
> needs to be involved?
>

The compiler applies generalised heuristics, which are certainly for the
'common' case, whatever that happens to be.
The compiler simply doesn't know what you're doing, so it's very hard for
the compiler to do anything really intelligent.

Inlining heuristics are fickle, and they also don't know what you're
actually trying to do.
Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we
prefer code size or execution speed? Is the function called only from this
location, or is it used in many locations? Etc.
Inlining is one of the most fuzzy pieces of logic in the compiler, and
relies on a lot of information that is impossible for the compiler to
deduce, so it applies heuristics to try and do a decent job, but it's
certainly not perfect.

I argue, nothing so fickle can exist in the language without having a
manual override. Especially not in a native language.

In my current case, the functions I need to inline are not exactly trivial.
They're really pushing the boundaries of the compilers inliner heuristics,
and then I'm calling a series of such functions that operate on parallel
data.
If they don't inline, the performance equals the sum of the functions plus
some overhead. If they all inline, the performance is equal to only the
longest one, and no overhead (the others fill in pipeline gaps).
Further, some of these functions embed some shared work... if they don't
inline, this work is repeated. If they do inline, the redundant repeated
work is eliminated.

My experiments with std.algorithm were a failure. I realised quickly that I
couldn't rely on the inliner to do a satisfactory job, and the optimiser
was unable to do it's job properly.
std.algorithm could really benefit from the mixin suggestion since things
like predicate functions are always trivial, usually supplied as little
lambdas, and inlining isn't reliable. Especially in the debug builds.
Something like algorithm loop sugar shouldn't run heaps worse than an
explicit loop just because it happens to be implemented by a generic
function.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20140314/22b1303a/attachment.html>


More information about the Digitalmars-d mailing list