inlining...

Fri Mar 14 05:02:52 PDT 2014

On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:
> On 14 March 2014 18:03, John Colvin 
> <john.loughran.colvin at gmail.com> wrote:
>
>> As much as I like the idea:
>>
>> Something always tells me this is the compilers job... What 
>> clever
>> reasoning are you applying that the compiler's inliner can't? 
>> It seems like
>> a different situation to say SIMD code, where correctly 
>> structuring loops
>> can require a lot of gymnastics that the compiler can't or 
>> won't (floating
>> point conformance) do. The inlining decision seems easily 
>> automatable in
>> comparison.
>>
>> I understand that unoptimised builds for debugging are a 
>> problem, but a
>> sensible compiler let's you hand pick your optimisation passes.
>>
>> In short: why are compilers not good enough at this that the 
>> programmer
>> needs to be involved?
>>
>
> The compiler applies generalised heuristics, which are 
> certainly for the
> 'common' case, whatever that happens to be.
> The compiler simply doesn't know what you're doing, so it's 
> very hard for
> the compiler to do anything really intelligent.
>
> Inlining heuristics are fickle, and they also don't know what 
> you're
> actually trying to do.
> Is a function 'long'? How long is 'long'? Is the function 
> 'hot'? Do we
> prefer code size or execution speed? Is the function called 
> only from this
> location, or is it used in many locations? Etc.
> Inlining is one of the most fuzzy pieces of logic in the 
> compiler, and
> relies on a lot of information that is impossible for the 
> compiler to
> deduce, so it applies heuristics to try and do a decent job, 
> but it's
> certainly not perfect.
>
> I argue, nothing so fickle can exist in the language without 
> having a
> manual override. Especially not in a native language.
>
> In my current case, the functions I need to inline are not 
> exactly trivial.
> They're really pushing the boundaries of the compilers inliner 
> heuristics,
> and then I'm calling a series of such functions that operate on 
> parallel
> data.
> If they don't inline, the performance equals the sum of the 
> functions plus
> some overhead. If they all inline, the performance is equal to 
> only the
> longest one, and no overhead (the others fill in pipeline gaps).
> Further, some of these functions embed some shared work... if 
> they don't
> inline, this work is repeated. If they do inline, the redundant 
> repeated
> work is eliminated.
>
> My experiments with std.algorithm were a failure. I realised 
> quickly that I
> couldn't rely on the inliner to do a satisfactory job, and the 
> optimiser
> was unable to do it's job properly.
> std.algorithm could really benefit from the mixin suggestion 
> since things
> like predicate functions are always trivial, usually supplied 
> as little
> lambdas, and inlining isn't reliable. Especially in the debug 
> builds.
> Something like algorithm loop sugar shouldn't run heaps worse 
> than an
> explicit loop just because it happens to be implemented by a 
> generic
> function.

Thanks for the explanations.

Another use case is to aid propogation of compile-time 
information for optimisation.
A function might look like a poor candidate for inlining for 
other reasons, but if there's a statically known (to the caller) 
integer parameter coming in that will be used to decide a loop 
length, inlining allows that info to be propogated to the callee. 
Static loop lengths => well optimised loops, with opportunities 
for optimal unrolling. Even with quite a large function this can 
be a good choice to inline.

I don't know how good compilers are at taking this sort of thing 
into account already.