How to get to a class initializer through introspection?
Johan
j at j.nl
Wed Aug 5 22:19:11 UTC 2020
On Wednesday, 5 August 2020 at 16:08:59 UTC, Johannes Pfau wrote:
> Am Wed, 05 Aug 2020 14:36:37 +0000 schrieb Johan:
>
>> On Wednesday, 5 August 2020 at 13:40:16 UTC, Johannes Pfau
>> wrote:
>>>
>>> I'd therefore suggest the following:
>>> 1) Make all init symbols COMDAT: This ensures that if a
>>> smybol is
>>> actually needed (address taken, real memcpy call) it will be
>>> available.
>>> But if it is not needed, the compiler does not have to output
>>> the
>>> symbol.
>>> If it's required in multiple files, COMDAT will merge the
>>> symbols into
>>> one.
>>>
>>> 2) Ensure the compiler always knows the data of that symbol.
>>> This probably means during codegen, the initializer should
>>> never be an external symbol. It needs to be a COMDAT symbol
>>> with attached initializer expression. And the initializer
>>> data must always be fully available in .di files.
>>>
>>> The two rules combined should allow the backend to choose the
>>> initialization method that is most appropriate for the target
>>> architecture.
>>
>> What you are suggesting is pretty much exactly what the
>> compilers already do. Except that we don't expose the
>> initialization symbol directly to the user (T.init is an
>> rvalue, and does not point to the initialization symbol), but
>> through TypeInfo.initializer. Not exposing the initializer
>> symbol to the user had a nice benefit: for cases where we
>> never want to emit an initializer symbol (very large structs),
>> we simply removed that symbol and started doing something else
>> (memset zero), without breaking any user code. However this
>> only works for all-zero structs, because TypeInfo.initializer
>> must return a slice ({init symbol, length}) to data or
>> {null,length} for all-zero (the 'null' is what we started
>> making use of). More complex cases cannot elide the symbol.
>>
>> Initializer functions would allow us to tailor initialization
>> for more complex cases (e.g. with =void holes, padding
>> schenanigans, or non-zero-but-repetitive-constant
>> double[1million] arrays, ...), without having to always
>> turn-on some backend optimizations (at -O0) and without having
>> to expose a TypeInfo.initializer slice, but instead exposing a
>> TypeInfo.initializer function pointer.
>>
>> -Johan
>
> But initializer symbols are currently not in COMDAT, or does
> LDC implement that? That's a crucial point, as it addresses
> Andrei's initializer bloat point. And it also means you can
> avoid emitting the symbol if it's never referenced. But if it
> is referenced, it will be available.
It does not matter whether the initializer symbol is in COMDAT,
because (currently) it has to be dynamically accessible (e.g. by
a user of a compiled library or e.g. by druntime GC object
destroy code) and thus cannot be determined whether it is
referenced at link/compile time.
> Initializer functions have the drawback that backends can no
> longer choose different strategies for -Os or -O2. All the
> other benefits you mention (=void holes, padding schenanigans,
> or non-zero-but-repetitive- constant double[1million] arrays,
> ...) can also be handled properly by the backend in the
> initializer-symbol case if the initializer expression is
> available to the backend. And you have to ensure that the
> initialization function can always be inlined, so without -O
> flags it may also lead to suboptimal code...
Backends can also turn an initializer function into a memcpy
function.
It's perfectly fine if code is suboptimal without -O.
You can simply express more with a function than with a symbol (a
symbol implies the function "memcpy(all)", whereas a function
could do that and more).
How would you express =void using a symbol in an object file?
> If the initializer optimizations depend on -O flags, it should
> also be possible to move the necessary steps in the backend
> into a different step which is executed even without
> optimization flags. Choosing to initialize using expressions
> vs. a symbol should not be an expensive step.
Actually, this does sound like an expensive analysis to me (e.g.
detecting the case of a large array with repetitive
initialization inside a struct with a few other members). But
maybe more practically, is it possible to enable/disable specific
optimization passes for individual functions with gcc backend at
-O0? (we can't with LLVM)
> I don't see how an initializer function would be more flexible
> than that. In fact, you could generate the initializer function
> in the backend if information about the initialization
> expression is always preserved. Constructing an initializer
> function earlier (in the frontend, or D user code) removes
> information about the target architecture (-Os, memory
> available, efficient addressing of local constant data, ...).
> Because of that, I think the backend is the best place to
> implement this and the frontend should just provide the symbol
> initializer expression.
I'm a little confused because your last sentence is exactly what
we currently do, with the terminology: frontend = dmd code that
outputs a semantically analyzed AST. Backend = DMD/GCC/LLVM
codegen. Possibly with "glue layer intermediate representation"
in-between.
What I thought is discussed in this thread, is that we move the
complexity out of the compilers (so out of current backends) into
druntime. For that, I think an initializer function is a good
solution (similar to emitting a constructor function, rather than
implementing that codegen inside the backend).
-Johan
More information about the Digitalmars-d
mailing list