How to get to a class initializer through introspection?

Wed Aug 5 16:08:59 UTC 2020

Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:

> On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau wrote:
>>
>> I'd therefore suggest the following:
>> 1) Make all init symbols COMDAT: This ensures that if a smybol is
>> actually needed (address taken, real memcpy call) it will be available.
>> But if it is not needed, the compiler does not have to output the
>> symbol.
>> If it's required in multiple files, COMDAT will merge the symbols into
>> one.
>>
>> 2) Ensure the compiler always knows the data of that symbol. This
>> probably means during codegen, the initializer should never be an
>> external symbol. It needs to be a COMDAT symbol with attached
>> initializer expression. And the initializer data must always be fully
>> available in .di files.
>>
>> The two rules combined should allow the backend to choose the
>> initialization method that is most appropriate for the target
>> architecture.
> 
> What you are suggesting is pretty much exactly what the compilers
> already do. Except that we don't expose the initialization symbol
> directly to the user (T.init is an rvalue, and does not point to the
> initialization symbol), but through TypeInfo.initializer. Not exposing
> the initializer symbol to the user had a nice benefit: for cases where
> we never want to emit an initializer symbol (very large structs), we
> simply removed that symbol and started doing something else (memset
> zero), without breaking any user code. However this only works for
> all-zero structs, because TypeInfo.initializer must return a slice
> ({init symbol, length}) to data or {null,length} for all-zero (the
> 'null' is what we started making use of). More complex cases cannot
> elide the symbol.
> 
> Initializer functions would allow us to tailor initialization for more
> complex cases (e.g. with =void holes, padding schenanigans, or
> non-zero-but-repetitive-constant double[1million] arrays, ...), without
> having to always turn-on some backend optimizations (at -O0) and without
> having to expose a TypeInfo.initializer slice, but instead exposing a
> TypeInfo.initializer function pointer.
> 
> -Johan

But initializer symbols are currently not in COMDAT, or does LDC 
implement that? That's a crucial point, as it addresses Andrei's 
initializer bloat point. And it also means you can avoid emitting the 
symbol if it's never referenced. But if it is referenced, it will be 
available.

Initializer functions have the drawback that backends can no longer 
choose different strategies for -Os or -O2. All the other benefits you 
mention (=void holes, padding schenanigans, or non-zero-but-repetitive-
constant double[1million] arrays, ...) can also be handled properly by the 
backend in the initializer-symbol case if the initializer expression is 
available to the backend. And you have to ensure that the initialization 
function can always be inlined, so without -O flags it may also lead to 
suboptimal code...

If the initializer optimizations depend on -O flags, it should also be 
possible to move the necessary steps in the backend into a different step 
which is executed even without optimization flags. Choosing to initialize 
using expressions vs. a symbol should not be an expensive step.

I don't see how an initializer function would be more flexible than that. 
In fact, you could generate the initializer function in the backend if 
information about the initialization expression is always preserved. 
Constructing an initializer function earlier (in the frontend, or D user 
code) removes information about the target architecture (-Os, memory 
available, efficient addressing of local constant data, ...). Because of 
that, I think the backend is the best place to implement this and the 
frontend should just provide the symbol initializer expression.

-- 
Johannes