=void in struct definition

Wed Apr 11 09:20:17 UTC 2018

On Wednesday, April 11, 2018 11:31:16 Shachar Shemesh via Digitalmars-d 
wrote:
> On 11/04/18 10:58, Jonathan M Davis wrote:
> > All objects are initialized with their init values prior to the
> > constructor being called. So, whether an object is simply
> > default-initialized or whether the constructor is called, you're going
> > to get the same behavior except for the fact that the constructor would
> > normally do further initialization beyond the init value. As such, if
> > there's a problem with the
> > default-initialized value, you're almost certainly going to get the same
> > problem when you call a constructor.
> >
> > - Jonathan M Davis
>
> That's horrible!
>
> That means that constructor initialized objects, regardless of size, get
> initialized twice.

Well, only the stuff you initialize in the constructor gets initialized
twice, but yeah, it could result in effectively initializing everything
twice if you initialize everything in the constructor. It's one of those
design choices that's geared towards correctness, since it avoids ever
dealing with the type having garbage, and the fact that you can do stuff
like

struct S
{
    int _i;

    this(int i)
    {
        foo();
        _i = 42;
    }

    void foo()
    {
        writeln(_i);
    }
}

means that if it doesn't initialize it with the init value first, then you
get undefined behavior, because _i would then be garbage when it's read
(which isn't necessarily a big deal with an int but could really matter if
it were something like a pointer). It also factors into how classes are
guaranteed to be fully initialized to the correct type _before_ any
constructors are run (avoiding the problems that you get in C++ when calling
virtual functions in constructors or destructors). Unfortunately, because
you're allowed to call arbitrary functions before initializing members, it's
also possible to violate the type system with regards to const or immutable.
e.g.

struct S
{
    immutable int _i;

    this(int i)
    {
        foo();
        _i = 42;
    }

    void foo()
    {
        writeln(_i);
    }
}

reads _i before it's fully initialized, so its state isn't identical every
time it's accessed like it's supposed to be. However, because the object is
default-initialized first, you never end up reading garbage, and the
behavior is completely deterministic even if it arguably violates the type
system. What the correct solution to that particular problem is, I don't
know (probably at least disallowing calling any member functions prior to
initializing any immutable or const members), but the fact that the object
is default-initialized first reduces the severity of the problem.

And while you can end up with portions of an object effectively being
initialized twice, for your average struct, I doubt that it matters much.
It's when you start doing stuff like having large static arrays that it
really becomes a problem. It also wouldn't surprise me if ldc optimized out
some of the double-initializations at least some of the time, but I very
much doubt that dmd's optimizer is ever that smart. Depending on the
implementation of the constructor though, I would think that it would be
possible for the compiler to determine that it doesn't actually need to
default-initialize the struct first (or that it can just default-initialize
pieces of it), because it can guarantee that a member variable isn't read
before it's initialized by the constructor. So, at least in theory, the
front end should be able to do some optimizations there. However, I have no
idea if it ever does.

I think that in theory, the idea is that we want initializion to be as
correct as possible, so there should be no garbage or undefined behavior
involved, and in the case of classes, the object should be fully the type
that it's supposed to be when its constructor is called so that you don't
get bad behavior from virtual functions, but we then have = void so that
specific variables can avoid that extra initialization cost when profiling
or whatnot show that it's important. So, if you have something like

struct S
{
    int _a;
    int[5000] _b;

    this(int a)
    {
        _a = a;
    }
}

then it's going to behave well as far as correctness goes, and then if the
initialization is too expensive, you do

S s = void;
s._a = 42;

I think that the problem is that void initialization was intended
specifically for local variables, and the idea of = void for member
variables was not really thought through. So, you can easily do something
like

S s = void;
s._a = 42;

right now and avoid the default-initialization, but you can't cleanly do

struct S
{
    int _a;
    int[5000] _b = void;

    this(int a)
    {
        _a = a;
    }
}

So, the process is completely manual, which obviously sucks if it's
something that you _always_ want to do with the type.

In general, D favors correctness over peformance with the idea that it gives
you backdoors to get around the correctness guarantees in order to get more
performance when it matters, but in this case, the backdoor arguably needs
some improvement.

- Jonathan M Davis