When D feels unfinished: union initialization and NRVO

Wed Mar 18 06:55:24 UTC 2020

So I've been toying around for a bit with writing a deserializer 
in D. It essentially converts types to an array of `ubyte[]` in a 
very simple way. Handles value types automatically, and pointers 
/ arrays. Nothing too fancy, but I wanted to write something 
*correct*, and compatible with the types I'm dealing with.

The issue I was faced with is how to handle qualifiers. For 
example, deserializing `const` or `immutable` data. Originally, 
my deserializer accepted a `ref T` as parameter and would 
deserialize into it. I changed it to return an element of type 
`T`.
deserialization should be composable, so if an aggregate defines 
the `fromBinary` static method, it is used instead of whatever 
the default for this type would otherwise be, and that method can 
forward to other `deserialize` call to deserialize its member.

Now here's the catch: Some of the things being deserialized are 
C++ types, and may include `std::vector`, so I wanted to avoid 
any unnecessary copy.

This set of requirement led me to a few simple observations:
- I cannot use a temporary and `cast`. Aside from the fact that 
most casts are an admission that the type system is insufficient, 
it would force me to pass the type by `ref` when composing, which 
would expose the `cast` to user code, hence not `@safe`;
- In order to avoid unnecessary copies, while returning value, I 
need to rely heavily on NRVO (the `cast` approach would also 
conflict with this);
- Hence, I need to be able to return literals of everything.

Approaching this for simple value type (int, float, etc...) is 
trivial. When it comes to aggregate, things get a bit more 
complicated. An aggregate can be made of other arbitrarily 
complex aggregates. The solution I have so far is to require a 
default-like constructor and have:
```
T deserialize (T) (scope DeserializeDg dg, scope const ref 
Options opts)
{
     // Loads of code
     else static if (is(T == struct))
     {
         Target convert (Target) ()
         {
             // `dg` is a delegate returning a `ubyte[]`, and 
`opts` are options to drive deserialization
             return deserialize!Target(dg, opts);
         }
         return T(staticMap!(convert, Fields!T));
     }
}
```

As any D user should be, I was slightly skeptical, so I made a 
reduced test case: 
https://gist.github.com/Geod24/61ef0d8c57c3916cd3dd7611eac8234e
It works as expected, which makes sense as we want to be 
consistent with the C++ standard that require NRVO on return 
where the operand is an rvalue.

However, not all structs are created equals, and some are not 
under my control (remember, C++ bindings). And yes this is where 
the rant begins.

How do you initialize the following ?
```
struct Statement
{
     StatementType type;     // This is an enum to discriminate 
which field is active
     union {                               // Oh no
         _prepare_t prepare_;  // Each of those are complex 
structs with std::array, std::vector, etc...
         _confirm_t confirm_;
     }
}
```

Of course my deserializer can't know about our custom tagged 
union, but luckily we have a hook, so (pseudo code again):
```
struct Statement
{
     /* Above definitions */
     static QT deserializeHook (QT) (scope DeserializeDg dg, scope 
const ref Options opts)
     {
         // deserialize `type`
         // then use a `final switch` and `return QT(type, 
deserialize!ActiveType(...))`
     }
}
```

Side note: `QT` is required here, because there's no way to know 
if `deserializeHook` was called via an `immutable(T)`, 
`const(shared(T))`, or just `T`.

The problem you face when you write this code is calling the `QT` 
constructor. Because the `union` is anonymous, 
`Statement.tupleof.length` is 3, not 2 as one would expect. And 
while calling `QT(type, _prepare_t.init)` works, calling 
`QT(type, _confirm_t.init)` will complain about mismatched type, 
because we are trying to initialize the second member, a 
`_prepare_t`, with a `_confirm_t`. And using `QT(type, 
_prepare_t.init, _confirm_t.init)` won't work either, because 
then the compiler complains about overlapping initialization!

There's a small feature that would be amazing here: struct 
literals! Unfortunately, they can *only* be used in variable 
declaration, nowhere else.
But is it really a problem ? Can't we just do the following:
```
QT ret = { type: type, _confirm_t: deserialize!_confirm_t(dg, 
opts) };
return ret;
```
Well no, because then, NRVO is not performed anymore.

I've been toying around with this problem for a few weeks, on and 
off. I really couldn't find a way to make it work. Using a named 
union just moves the problem to the union literal (which is a 
struct literal, under the hood). Guaranteeing NRVO could have 
negative impact on C/C++ interop, so the only thing that could 
help is to extend struct literals. Changing struct constructor to 
account for `union` is not possible either, because an `union` 
can have multiple fields of the same type.

Note that this is just the tip of the iceberg. Has anyone ever 
tried to make an array literal of a non-copyable structure in one 
go ? Thanks to tuple, one can use the `staticMap` approach if the 
length is known at compile time (thanks to tuples), but what 
happens if it's only known at runtime ? `iota + map + array` does 
not work with `@disable this(this)`. And let's not even mention 
AA literals.

We've had quite a few new feature making their way in the 
language over the past few years, but many of the old features 
are left unfinished. We have a new contract syntax, but contract 
are still quite broken (quite a few bugs, as well as usability 
issues, e.g. one can't call the parent's contract). We are 
interfacing more and more with C++, but don't have the ability to 
control copies, and the compiler and Phobos alike assume things 
are copiable (you can't foreach over a range which has `@disable 
this(this)`). We want to make the language `@safe` by default, 
but we lack the language constructs to build libraries that works 
with both `@system` and `@safe`. Our default setup for `assert` 
is still not on par with what C does with a macro, and 
`-checkaction=context` is far from being ready (mostly due to the 
issues mentioned previous). We are piling up `-transition` 
switches 10 times faster than we are removing them.

This could go on for a while, but the point I wanted to make is: 
can we focus on the last 20%, please?