How about some __initialize magic?

Stanislav Blinov stanislav.blinov at gmail.com
Sat Nov 27 21:56:05 UTC 2021


D lacks syntax for initializing the uninitialized. We can do this:

```d
T stuff = T(args); // or new T(args);
```

but this?..

```d
T* ptr = allocateForT();
// now what?.. Can't just do *ptr = T(args) - that's an 
assignment, not initialization!
// is T a struct? A union? A class? An int?.. Is it even a 
constructor call?..
```

This is, uh, "solved", using library functions - 
`emplaceInitializer`, `emplace`, `copyEmplace`, `moveEmplace`. 
The fact that there are __four__ functions to do this should 
already ring a bell, but if one was to look at how e.g. the 
`emplace` is implemented, there's lots and lots more to it - 
classes or structs? Constructor or no constructor? Postblit? 
Copy?.. And all the delegation... A single call to `emplace` may 
copy the bits around more than once. Talk about initializing a 
static array... Or look at `emplaceInitializer`, which the other 
three all depend upon: it is, currently, built on a hack just to 
avoid blowing up the stack (which is, ostensibly, what previous 
less hacky hack lead to). Upcoming `__traits(initSymbol)` would 
help in removing the hack, but won't help CTFE any. At various 
points of their lives, these things even explicitly called 
`memcpy`, which is just... argh! And some still do 
(`copyEmplace`, I'm looking at you). Call into CRT to blit a 
8-byte struct? With statically known size and alignment? Just to 
sidestep type system? Eh??? Much fun for copying arrays!
...And still, none of them would work in CTFE for many types, due 
to various implementation quirks (which include those very calls 
to memcpy, or reinterpret casts). This one could, potentially, be 
solved with more barbed wire and swear words, that is, code, 
but...

Thing is, all those functions are re-implementing what the 
compiler can already do, but in a library. Or rather, come very 
close to doing that, but still don't really get there. C++ with 
its library solution does this better!

What if the language specified a "magic" function, called, say, 
`__initialize`, that would just do the right thing (tm)? Given an 
lvalue, it would instruct the compiler to generate code writing 
initializer, bliting, copying, or calling the appropriate 
constructor with the arguments. And most importantly, would work 
in CTFE regardless of type, and not require weird dances around 
T.init, dummy types involving extra argument copies, or manual 
fieldwise and elementwise blits (which is what one would have to 
do in order to e.g. make `copyEmplace` CTFE-able).

I.e:

```d
// Write .init
T* raw0 = allocateForT();
// currently - emplaceInitializer(raw0);
(*raw0).__initialize;

// Initialize fields or call constructor, whichever is applicable 
for T(arg1, arg2)
T* raw1 = allocateForT();
// currently - raw1.emplace(forward!(arg1, arg2));
(*raw1).__initialize(forward!(arg1, arg2));

// Copy
T* raw2 = allocateForT();
// currently - copyEmplace(*raw1, *raw2);
(*raw2).__initialize(*raw1);

// Move
T* raw3 = allocateForT();
// currently - moveEmplace(*raw2, *raw3);
(*raw3).__initialize(move(*raw2));

// Could be called at runtime or during CTFE
auto createArray()
{
    // big array, don't initialize
    const(T)[1000] result = void;
    // exception handling omitted for brevity
    foreach (i, ref it; result)
    {
        // currently - `emplace`, which may fail to compile in CTFE
        it.__initialize(createIthElement(i));
    }
    return result;
}

// CTFE use case:
static auto array = createArray();
```

The wins are obvious - unified syntax, better error messages, 
CTFE support, less library voodoo failing at mimicking the 
compiler. The losses? I don't see any.

Note that I am not talking about yet another library function. 
This would not be a symbol in druntime, this would be compiler 
magic. Having that, `emplaceInitializer`, `emplace` and 
`copyEmplace` could be re-implemented in terms of `__initialize`, 
and eventually deprecated and removed. `moveEmplace` could linger 
until DIP1040 is implemented, tried, and proven. The `move` 
example, verbatim, would be pessimized compared to `moveEmplace` 
due to moving twice, which hopefully DIP1040 could solve.

I'm a bit hesitant to suggest how this should interact with 
`@safe`. On one hand, the established precedent is in `emplace` - 
it infers, and I'm leaning towards that, even though it can 
potentially invalidate existing state. On the other hand, because 
it can indeed invalidate existing state, it should be `@system`. 
But then it would require some additional facility just for 
inference, so it could be called `@trusted` correctly, otherwise 
it'd be useless. And that facility, whatever it is, better not be 
another library reincarnation of all required semantics. For 
example, something like a `__traits(isSafeToInitWith, T, args)`. 
Whichever the approach, it should definitely infer all other 
attributes.

There are undoubtedly other things to consider. For example - 
classes. It would seem prudent for this hypothetical 
`__initialize` to be calling class ctors. On the other, a 
reference itself is just a POD, and generic code might indeed 
want to write null as opposed to attempting to call a default 
constructor. Then again, generic code still would have to 
specialize for classes... Thoughts welcome.

What do you think? DIP this, yay or nay? Suggestions?..


More information about the Digitalmars-d mailing list