When D feels unfinished: union initialization and NRVO
Mathias Lang
pro.mathias.lang at gmail.com
Wed Mar 18 06:55:24 UTC 2020
So I've been toying around for a bit with writing a deserializer
in D. It essentially converts types to an array of `ubyte[]` in a
very simple way. Handles value types automatically, and pointers
/ arrays. Nothing too fancy, but I wanted to write something
*correct*, and compatible with the types I'm dealing with.
The issue I was faced with is how to handle qualifiers. For
example, deserializing `const` or `immutable` data. Originally,
my deserializer accepted a `ref T` as parameter and would
deserialize into it. I changed it to return an element of type
`T`.
deserialization should be composable, so if an aggregate defines
the `fromBinary` static method, it is used instead of whatever
the default for this type would otherwise be, and that method can
forward to other `deserialize` call to deserialize its member.
Now here's the catch: Some of the things being deserialized are
C++ types, and may include `std::vector`, so I wanted to avoid
any unnecessary copy.
This set of requirement led me to a few simple observations:
- I cannot use a temporary and `cast`. Aside from the fact that
most casts are an admission that the type system is insufficient,
it would force me to pass the type by `ref` when composing, which
would expose the `cast` to user code, hence not `@safe`;
- In order to avoid unnecessary copies, while returning value, I
need to rely heavily on NRVO (the `cast` approach would also
conflict with this);
- Hence, I need to be able to return literals of everything.
Approaching this for simple value type (int, float, etc...) is
trivial. When it comes to aggregate, things get a bit more
complicated. An aggregate can be made of other arbitrarily
complex aggregates. The solution I have so far is to require a
default-like constructor and have:
```
T deserialize (T) (scope DeserializeDg dg, scope const ref
Options opts)
{
// Loads of code
else static if (is(T == struct))
{
Target convert (Target) ()
{
// `dg` is a delegate returning a `ubyte[]`, and
`opts` are options to drive deserialization
return deserialize!Target(dg, opts);
}
return T(staticMap!(convert, Fields!T));
}
}
```
As any D user should be, I was slightly skeptical, so I made a
reduced test case:
https://gist.github.com/Geod24/61ef0d8c57c3916cd3dd7611eac8234e
It works as expected, which makes sense as we want to be
consistent with the C++ standard that require NRVO on return
where the operand is an rvalue.
However, not all structs are created equals, and some are not
under my control (remember, C++ bindings). And yes this is where
the rant begins.
How do you initialize the following ?
```
struct Statement
{
StatementType type; // This is an enum to discriminate
which field is active
union { // Oh no
_prepare_t prepare_; // Each of those are complex
structs with std::array, std::vector, etc...
_confirm_t confirm_;
}
}
```
Of course my deserializer can't know about our custom tagged
union, but luckily we have a hook, so (pseudo code again):
```
struct Statement
{
/* Above definitions */
static QT deserializeHook (QT) (scope DeserializeDg dg, scope
const ref Options opts)
{
// deserialize `type`
// then use a `final switch` and `return QT(type,
deserialize!ActiveType(...))`
}
}
```
Side note: `QT` is required here, because there's no way to know
if `deserializeHook` was called via an `immutable(T)`,
`const(shared(T))`, or just `T`.
The problem you face when you write this code is calling the `QT`
constructor. Because the `union` is anonymous,
`Statement.tupleof.length` is 3, not 2 as one would expect. And
while calling `QT(type, _prepare_t.init)` works, calling
`QT(type, _confirm_t.init)` will complain about mismatched type,
because we are trying to initialize the second member, a
`_prepare_t`, with a `_confirm_t`. And using `QT(type,
_prepare_t.init, _confirm_t.init)` won't work either, because
then the compiler complains about overlapping initialization!
There's a small feature that would be amazing here: struct
literals! Unfortunately, they can *only* be used in variable
declaration, nowhere else.
But is it really a problem ? Can't we just do the following:
```
QT ret = { type: type, _confirm_t: deserialize!_confirm_t(dg,
opts) };
return ret;
```
Well no, because then, NRVO is not performed anymore.
I've been toying around with this problem for a few weeks, on and
off. I really couldn't find a way to make it work. Using a named
union just moves the problem to the union literal (which is a
struct literal, under the hood). Guaranteeing NRVO could have
negative impact on C/C++ interop, so the only thing that could
help is to extend struct literals. Changing struct constructor to
account for `union` is not possible either, because an `union`
can have multiple fields of the same type.
Note that this is just the tip of the iceberg. Has anyone ever
tried to make an array literal of a non-copyable structure in one
go ? Thanks to tuple, one can use the `staticMap` approach if the
length is known at compile time (thanks to tuples), but what
happens if it's only known at runtime ? `iota + map + array` does
not work with `@disable this(this)`. And let's not even mention
AA literals.
We've had quite a few new feature making their way in the
language over the past few years, but many of the old features
are left unfinished. We have a new contract syntax, but contract
are still quite broken (quite a few bugs, as well as usability
issues, e.g. one can't call the parent's contract). We are
interfacing more and more with C++, but don't have the ability to
control copies, and the compiler and Phobos alike assume things
are copiable (you can't foreach over a range which has `@disable
this(this)`). We want to make the language `@safe` by default,
but we lack the language constructs to build libraries that works
with both `@system` and `@safe`. Our default setup for `assert`
is still not on par with what C does with a macro, and
`-checkaction=context` is far from being ready (mostly due to the
issues mentioned previous). We are piling up `-transition`
switches 10 times faster than we are removing them.
This could go on for a while, but the point I wanted to make is:
can we focus on the last 20%, please?
More information about the Digitalmars-d
mailing list