Can the D compiler detect final and optimize?

Sun Dec 27 19:33:08 UTC 2020

On Sunday, 27 December 2020 at 15:22:07 UTC, Iain Buclaw wrote:
> On Sunday, 27 December 2020 at 00:25:54 UTC, Johan wrote:
>> On Friday, 25 December 2020 at 19:29:26 UTC, Daniel Kozak 
>> wrote:
>>>
>>> I believe LTO could do that in some cases
>>
>> "Whole program optimization", indeed. Search for 
>> `fwhole-program` for GCC and Clang C++ compilers.
>> I don't know how it works for GDC (I think it may have an edge 
>> over LDC on this topic), but for LDC we don't tell the 
>> optimizer that our vtables are indeed vtables, so I'm not sure 
>> if devirtualization works as well as for Clang (which does 
>> inform the optimizer about vtables).
>>
>
> When `scope` is used, the compiler does constant propagation of 
> the vtable, because what the optimizer sees is:
>
>     MyClass __scopeinit = MyClass.init;
>     MyClass* var = &__scopeinit;
>     (var.__vptr + 40)();
>
> If the guts of _d_newclass were also made visible (such as, it 
> was templatized in object.d), then such devirtualization 
> through constant propagation would also occur for simple cases 
> of classes new'd on the GC.

This is already done by LDC (because of overwriting the vtable 
ptr after calling _d_newclass, which is redundant but indeed 
helps with devirtualizing).

> Other than that, I'm not sure about other ways to do full 
> devirtualisation of method calls, you are more likely thinking 
> of speculative devirtualization

No. I meant that if you know (or assume to know) the full 
inheritance hierarchy, you can devirtualize this:

void foo(A a) {
   a.some_virtual_call();
}

> speculative devirtualization, which looks something like this:
>
>     if ((&var.foo).funcptr is &MyClass.foo)
>         MyClass.foo();
>     else
>         var.foo();
>
> If the direct call is inlined, and if the condition is true, 
> then the resulting code may run about 3-5x faster for simple 
> functions.
>
> Though these days, I think most CPUs have branch prediction for 
> even indirect calls, so if no optimization happens, the 
> speculative devirtualization like the above will just consume 
> code space and branch prediction buffers.

Not so: 
http://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html

The reason is that devirtualization enables other optimizations 
(like inlining, which in turn creates new profitable 
optimizations).

Rather than doing this whole program optimization with the 
assumption (!) of knowing the full calls hierarchy, I think it is 
better to work on preserving vtable knowledge for subsequent 
method calls, using the knowledge that the object's dynamic type 
can not change at runtime: although a class method call cannot 
change the vtable, the GDC and LDC optimizers assume that it may 
be overwritten because a non-const pointer to the object (the 
`this` ptr) is passed to the class method.

See: https://d.godbolt.org/z/xbajvT
```
     A a = new A();  // `scope` is needed for GDC
     a.foo(); // devirtualized
     a.foo(); // not devirtualized
     a.foo(); // not devirtualized
```

-Johan