The Atom Consists of Protons, Neutrons and Electrons

Mon Feb 4 16:23:40 PST 2013

Introduction:

A somewhat heated discussion between Steven Schveighoffer and 
myself led to his challenging me to show not only how properties 
could be implemented as structs, but also why that is the best 
way for D to implement them.

The challenge is to do better, both in terms of functionality and 
in terms of syntax, than his proposal:

@property foo {
    int get(); // or opGet
    void set(int val); // or opSet

    opBinary(...)  // etc.
}

The @property namespace defined above implements all operator 
overrides a struct is capable of, but with access to its 
surrounding scope. For example:

struct Goo
{
   int _foo;
   property foo { int get(){ return _foo; } }
}

Good stuff, and D would not be the worse off for a property 
implementation such as this. Note how I removed the @ in front of 
property, because if we go this far, we might as well just go all 
the way and add it to the language as a keyword.

My job, therefore, was to imagine how simple structs could, at 
least in theory, do all of this, plus some, thus providing a 
better product and saving D a keyword.

I hope I'm not too late with this proposal. I've used the 
metaphor of the atom (i.e. explicit properties), adhering to the 
theory that there's no need to provide atoms if you can provide 
all of the protons, neutrons, and electrons they consist of.

Part One: Neutrons

We'll start with the heaviest particle first.

Why is it that a struct nested inside a function is allowed 
access to its function's data, whereas a struct nested inside a 
struct receives no such privilege?

void func()
{
   int n;
   struct G { int getN() { return n; } } // Hey, no problem
}

struct foo
{
   int n;
   struct G { int getN() { return n; } } // Error: n not defined
}

Is it too much to ask that a struct gain access to an instance of 
its parent's data? Well, yes. First of all, the nested instance 
would have to hold a hidden pointer to a parent instance, not 
only bloating the nested instance but also risking losing track 
of the desired parent instance should the parent instance get 
moved in memory.

But wait. A struct's member functions act upon its _own_ data 
just fine. That's because they are designed to receive a hidden 
pointer to an instance of their struct. And it's not actually the 
nested struct's data which would want to operate upon its parent 
struct, anyway. After all, data doesn't act upon data. The 
machine code containing the instructions to operate upon data is 
never identical to the data it needs to operate on (footnote: 
with the notable exception of the video game Yar's Revenge for 
the Atari 2600, in which the machine code was actually used to 
create an ad hoc random color palette - see the book "Racing the 
Beam" by Nick Montfort and Ian Bogost). Would it therefore be 
possible to allow the nested struct's _functions_ to operate upon 
an instance of its parent struct?

I think so. Here's how it would work. When a function is being 
compiled, the compiler keeps two lists, a short list of struct 
types to which it must include hidden pointers, and a stack of 
the functions currently being analyzed. It adds pointers to the 
list and attaches them to the symbols according to the following 
algorithm. If the symbol is not found in the function definition 
itself:

1. Look for it at the level of the enclosing struct.
2. If it is found there and it represents a data type, check the 
pointer list for that struct's type, and add it if it's not there 
already. Attach the symbol to that instance and move on, you're 
done.
3. If it is found and it represents a function, and semantic has 
already been run on the called function, add any hidden pointers 
it requires to your own list and attach them to the call. You're 
done.
4. If it represents a function and semantic has not been done, 
check the stack for the function represented by the symbol. If it 
is found, stop. It will take a second semantic pass to attach the 
right pointers. Otherwise, add the current function to the stack 
and analyze the function represented by the symbol. Add the 
hidden pointers it needs to your own list, attach them to the 
call, and you're done.
5. If it is not found and the struct is marked static, or if the 
struct in question is being defined at module level, you're done 
here. Continue to lookup the symbol at the module and import 
levels.
6. If it is not found and the struct definition is nested inside 
another struct, look for it in that struct. Goto 2.

Now we have a list of hidden pointers to enclosing structs which 
the function must take. The function uses these pointers 
invisibly, giving potential access to all members of all parent 
types. To refer to the 'this' pointer of any one of these, 
'outer' may be used, then 'outer.outer', etc.

This is a complex new feature. I have therefore written an 
elaborate example to help to clarify how and when it might be 
used.

Meet Sparky(™), the most advanced electronic security dog the 
world will ever see. He's got a brain to house his advanced A.I. 
and a body to house his physics engine, which consists of a tail 
and a bladder. Sparky has stopped every intruder who ever crossed 
his path. He has no known weaknesses. Well, except for those 
pesky Jolt Brand Caffeinated Dog Biscuits. Feed him a Jolt and he 
just can't resist himself. Here is his current implementation:

Dog sparky;
struct Dog {
   Brain brain;
   struct Brain
   {
     bool asleep = false;
     void think() {
       if(!asleep) {
         // Sparky has truly advanced A.I. and will stop
         // any intruder so long as he is awake
       }
     }
   }
   Body bodi;
   struct Body
   {
     bool broken = false;
     Bladder bladder;
     struct Bladder {
       void release() {
         // An absolutely fascinating implementation
       }
     }
     Tail tail;
     struct Tail {
       int wagSpeed = 0;
       void wag() { ++wagSpeed; }
     }
   }
   void jolt() {
      bodi.tail.wag;
      if (bodi.tail.wagSpeed >= 7) malfunction;
   }
   void malfunction() {
     bodi.broken = true;
     bodi.tail.wagSpeed = 0;
     bodi.bladder.release;
     brain.asleep = true;
   }
}

Note how function malfunction() is declared at the top level of 
struct Dog. It has to be, because its purpose is to respond to 
calamity by adjusting all the parts of the Dog. It would make 
more sense, however, to declare the functionality closer to its 
prime cause. This is how it would look with the new language 
feature. I have here renamed function malfunction to suit its new 
location:

Dog sparky;
struct Dog {
   . . .
   struct Body
   {
     . . .
     struct Tail {
       . . .
       // Used to be function malfunction()
       void wagTheDog()
       {
         wagSpeed = 0;
         broken = true;
         bladder.release;
         brain.asleep = true;
       }
     }
   }
   void jolt() {
      bodi.tail.wag;
      if (bodi.tail.wagSpeed >= 7) bodi.tail.wagTheDog;
   }
}

wagTheDog does not need to use the full names of bodi and tail, 
because they have been passed to it by hidden pointer in the 
original function call 'bodi.tail.wagTheDog'. In fact, this is 
the only way from the outside to call a nested struct function 
which uses its parents' data. The struct objects have no pointers 
to their parents, so they must be provided by fully naming them 
at the call site.

To illustrate more clearly, I'll show how the compiler rewrites 
function wagTheDog as a standard top-level function:

void wagTheDog(ref Dog __dog, ref Body __body, ref Tail __tail)
{
   __tail.wagSpeed = 0;
   __body.broken = true;
   __body.bladder.release;
   __dog.brain.asleep = true;
}

Because it causes confusion both for the programmer and the 
compiler, calling a parent function from a nested function using 
an ad hoc struct object should probably be made illegal:

struct Dog {
   Brain brain;
   struct Body {
     Tail tail;
     struct Tail {
       void wagTheDog() { brain.asleep = true; }
       void tryToWag()
       {
         wagTheDog(); // Okay, fetches implicit pointers

         Tail tail;  // Ad hoc instance of Tail
         tail.wagTheDog(); // Error: Incomplete function call

         outer.tail = tail; // Okay, we've got a new tail
         wagTheDog(); // New tail attached. Works just fine

       }
     }
   }
}

wagTheDog detects that brain is a declaration two nests above and 
thus requires a Dog in order to be called. I think it is too much 
to demand that the compiler perform some kind of mix-and-match 
service as in the case of tail.wagTheDog(). It must simply detect 
this as a partial call and give an error. The workaround shown 
above is just as effective and not as confusing. Note also that 
tryToWag has inherited the need for a full set of pointers from 
the outside.

Just so you know how it works underneath, the compiler rewrites:

sparky.bodi.tail.wagTheDog;

as:

wagTheDog(sparky, sparky.bodi, sparky.bodi.tail);

That's the feature. So what would the impact to the D language be 
with this new implementation of (non-static) nested structs?

First of all, would any code break? Well, if you examine how the 
suggested feature works, you'll see that the only source of 
breakage comes from duplicating a symbol both at module and at 
parent struct levels.

int hmmm = 3;
struct A {
   int hmmm = 2;
   B b;
   struct B {
     int f() { return hmmm; }
   }
}
A a;
assert(a.b.f == 2);

While the shadowing of variables might be the occasional source 
of bugs, you don't have to worry about getting access to parent 
fields because you can just use 'outer' to get a reference to a 
parent's 'this' field and '.' for the module. All told, it is an 
extraordinarily light form of code breakage, and I would not be 
surprised if it didn't break any code at all in most existing 
projects, since duplicating names inside nests is a bad practice 
anyway. Also, I don't know if any of the binary APIs insist on 
passing a pointer to member functions, even those which don't in 
fact use the data referred to, but if so, there will obviously be 
associated performance costs.

And no, it's not an earth-shaking feature, but it does have a 
certain elegance to it, in my opinion, adding some flexibility 
and even some fun to using nested structs.

Part Two: Protons

Having examined the largest particle, let's move on to the second 
largest.

I'm sure everyone at one point has wanted to define a single 
instance of a structure without having to come up with both the 
name of the type and the name of the instance. Either you just 
want to whip up something quickly or you know for sure that you 
only need one instance. A syntax that facilitates this isn't 
going to get in your way when it's time to get "responsible" and 
declare a full-fledged type. The body of the declaration remains 
the same. It's just the declaration signature which has to change.

In terms of implementation, I could be wrong, but it seems rather 
trivial. Just define a new hidden type and create an instance of 
it using the name provided.

So how might D go about doing this for structs?

Well, anonymous structs already exist in the language, so that's 
certainly a good start. How about we just write the anonymous 
struct and then put the name of the single instance of the struct 
after it like we'd do with most other declarations?

struct {} foo;

Looks good to me. Except, of course, for the obvious fact that 
structs are never this short in real life. That 'foo' could come 
two thousand lines into the file for a particularly vicious 
single-instance struct.

There's got to be a way to move the name to the top while not 
leaving the syntax ambiguous as to what is being defined. What if 
we did something like:

alias foo struct {}

That would work. People could get used to it and eventually know 
by heart that when they saw 'alias xxxxx struct', they were 
working with a single-instance structure.

But while it is elegant, it's still a little noisy. What if you 
just took 'alias' away?

foo struct {}

Would that actually work? Let's see, if the parser sees an 
identifier, then 'struct'… yes, I think it *would* work.

It's king of the hill. Yes, it's rather high and mighty, but then 
again, it's a type which only has one instance. Maybe it 
*deserves* to be high and mighty. After all, "there can be only 
one." So I called it a Highlander, and I think it's a good 
syntax, although, once again, not exactly an earth-shattering 
feature.

Part Three: Electrons

This last particle is easy.

Emulating a built-in type with a struct object using opCall() can 
leak parentheses().

TrackedInt foo;
struct TrackedInt {
   private int _n;
   int timesAccessed;
   int opCall() { ++timesAccessed; return _n; }
}

foo; // Okay, we're tracking it, so it's not do-nothing code
foo(); // This doesn't look like an int…

The workaround is to use 'alias this' on the function you 
actually want in place of opCall:

struct TrackedInt {
   int someRandomFunction() { … }
   alias someRandomFunction this;
}

But this could be made nicer, if it turns out we're doing this a 
lot. Why not just add operator opGet to the list of a struct's 
operator overloads?

struct TrackedInt
{
   int opGet() { … }
}

foo; // Okay
foo(); // Error: no opCall defined!

And that's it for these little particles.

Conclusion:

Structs needed to be whipped into shape to see how well they 
could do as built-in properties. Let's see how they did. If I'm 
at module scope, the language already provides a mechanism for 
structs-as-properties. Look at the following (partial) definition 
of std.array.front in today's D:

import std.traits;
Front front;
struct Front
{
   alias someFunction this;
   ref T someFunction(T)(T[] a)
   if (!isNarrowString!(T[]) && !is(T[] == void[]))
   {
      assert(a.length, "Attempting to fetch the front of an empty 
array of " ~ typeof(a[0]).stringof);
      return a[0];
   }
}
assert([1,2,3].front == 1);

People are so used to the idea that structs operate on their own 
data that they don't realize that that's not the sine qua non of 
their existence. The compiler can easily figure out which 
pointers to which data it needs to include in its hidden fields. 
A property is a named set of overloaded operations on a piece of 
data which replaces the appearance of that data in code(™). 
Structs already perform this service for their own fields, and 
everyone seems to agree that this is a good thing. Why then 
should they not be expanded to be able to provide the same 
service for any data? It would spare the implementors from having 
to design a whole new mechanism, which wouldn't do anything that 
can't be done with structs anyway.

All three of the language features described above serve this 
function. The neutron makes it possible to nest the definition of 
'front' above, so it can now access its parent struct's data:

struct DomesticatedArray(T)
{
   private T[] _data;
   Front front;
   struct Front
   {
     alias someFunction this;
     ref T someFunction(T)()
     if (!isNarrowString!(T[]) && !is(T[] == void[]))
     {
        assert(_data.length, "Attempting to fetch the front of an 
empty array of " ~ typeof(_data[0]).stringof);
        return _data[0];
     }
   }
}
DomesticatedArray!int neutron = { [1,2,3] };
assert(neutron.front == 1);

But it's kind of awkward to define. That's where the Highlander 
syntax comes in:

struct DomesticatedArray(T)
{
   T[] _data;
   front struct
   {
     alias someRandomFunction this;
     ref T someRandomFunction(T)()
     if (!isNarrowString!(T[]) && !is(T[] == void[]))
     {
        assert(_data.length, "Attempting to fetch the front of an 
empty array of " ~ typeof(_data[0]).stringof);
        return _data[0];
     }
   }
}
DomesticatedArray!int nucleus = { [1,2,3] };
assert(nucleus.front == 1);

opGet finishes the job:

struct DomesticatedArray(T)
{
   T[] _data;
   front struct
   {
     ref T opGet(T)()
     if (!isNarrowString!(T[]) && !is(T[] == void[]))
     {
        assert(_data.length, "Attempting to fetch the front of an 
empty array of " ~ typeof(_data[0]).stringof);
        return _data[0];
     }
   }
}
DomesticatedArray!int atom = { [1,2,3] };
assert(atom.front == 1);
assert(!is(atom.front() == 1));

I think enhanced structs do pretty well as a replacement for 
explicit properties. Not only that, but each of the new features 
which make properties possible has a use or two of its own, 
totally apart from its effectiveness as a property replacement. I 
have attempted to prove that properties are nothing more than the 
some of their component parts. The atom consists of protons, 
neutrons, and electrons.

Smash.