Lack of optimisation with returning address of inner functions

Fri Sep 4 01:10:48 UTC 2020

This question is about a peculiar lack of optimisation in a 
certain weird case only.

Example, see https://d.godbolt.org/z/54eaGd  ; either LDC or GDC 
may be used, results are the same here :

auto test2() {
     int a = 20;
     int foo() { return a + 5; } // inner function
     return &foo;  // other way to construct delegate
     }

auto bar()
     {
     return foo();
     }

Now with LDC or GDC, inspecting the code generated, the code for 
foo is simply literally { return 25; }, yet if test2 is called, 
the code generated for the foo2 routine is not used; rather the 
generated code is :

     call _d_allocmemory
     mov dword ptr [rax], 20
     mov rdx, foo
     ret

1. So why the lack of optimisation? - could simply have got rid 
of the delegate generation in test2a as implementations when it 
is inlined in bar (and which is done sanely [!] in the generated 
code for test2a).

2. Even weirder, if you delete the & from &foo leaving simply 
"return foo;" then this fixes the non-optimisation bug. Why?

3. What’s the difference between foo and &foo ?

4. Leaving aside the special case above where the inner 
function’s address is returned, surely in many cases an inner 
function can be converted into an ordinary function, or simply 
_inlined_ so there is no function at all, no? As is seen in the 
standalone code generated for foo.