Classes or stucts :: Newbie

Mon Dec 20 03:11:49 PST 2010

On Monday 20 December 2010 01:52:58 spir wrote:
> On Mon, 20 Dec 2010 01:29:13 -0800
> 
> Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> > > For me, the important difference is that classes are referenced, while
> > > structs are plain values. This is a semantic distinction of highest
> > > importance. I would like structs to be subtype-able and to implement
> > > (runtime-type-based) polymorphism.
> > 
> > Except that contradicts the facts that they're value types. You can't
> > have a type which has polymorphism and is a value type. By its very
> > nature, polymorphism requires you to deal with a reference.
> 
> Can you expand on this?
> 
> At least Oberon has value structs ("records") with inheritance and
> polyporphism; I guess the turbo Pascal OO model was of that kind, too
> (unsure) -- at least the version implemented in freepascal seems to work
> fine that way. And probably loads of less known PLs provide such a
> feature. D structs could as well IIUC: I do not see the relation with
> instances beeing implicitely referenced. (Except that they must be passed
> by ref to "member functions" they are the receiver of, but this is true
> for any kind of OO, including present D structs.)
> 
> (I guess we have very different notions of "reference", as shown by
> previous threads.)

Okay. This can get pretty complicated, so I'm likely to screw up on some of the 
details, but this should give you a basic idea of what's going on.

In essentially any C-based language, when you declare an integer on the stack 
like so:

int a = 2;

you set aside a portion of the stack which is the exact size of an int 
(typically 32 bits, but that will depend on the language). If you declare a 
pointer,

int* a;

then you're setting aside a portion of the stack the size of a pointer (32 bits 
on a 32 bit machine and 64 bits on a 64 bit machine). That variable then holds 
an address - typically to somewhere on the heap, though it could be to an 
address on the stack somewhere. In the case of int*, the address pointed to will 
refer to a 32-bit block of memory which holds an int.

If you have a struct or a class that you put on the stack. Say,

class A
{
    int a;
    float b;
}

then you're setting aside exactly as much space as that type requires to hold 
itself. At minimum, that will be the total size of its member variables (in this 
case an int and a float, so probably a total of 64 bits), but it often will 
include extra padding to align the variables along appropriate boundaries for 
the sake of efficiency, and depending on the language, it could have extra type 
information. If the class has a virtual table (which it will if it has virtual 
functions, which in most any language other than C++ would mean that it 
definitely has a virtual table), then that would be part of the space required 
for the class as well (virtual functions are polymorphic; when you call a 
virtual function, it calls the version of the function for the actual type that 
an object is rather than the pointer or reference that you're using to refer to 
the object; when a non-virtual function function is called, then the version of 
the function which the pointer or reference is is used; all class functions are 
virtual in D unless the compiler determines that they don't have to be and 
optimizes it out (typically because they're final); struct functions and stand-
alone functions are never virtual). The exact memory layout of a type _must_ be 
known at compile time. The exact amount of space required is then known, so that 
the stack layout can be done appropriately.

If you're dealing with a pointer, then the exact memory layout of the memory 
being pointed to needs to be known when that memory is initialized, but the 
pointer doesn't necessarily need to know it. This means that you can have a 
pointer of one type point to a variable of another type. Now, assuming that 
you're not subverting the type system (e.g. my casting int* to float*), you're 
dealing with inheritance. For instance, you have

class B : A
{
    bool c;
}

and a variable of type A*. That pointer could point to an object which is 
exactly of type A, or it could point to any subtype of A. B is derived from A, 
so the object could be a B. As long as the functions are virtual, you can have 
polymorphic functions by having the virtual table used to call the version of 
the function for the type that the object actually is rather than the type that 
the pointer is.

References are essentially the same as pointers (though they may have some extra 
information with them, making them a bit bigger than a pointer would be in terms 
of the amount of space required on the stack). However, in the case of D, 
pointers are _not_ treated as polymorphic (regardless of whether a function is 
virtual or not), whereas references _are_ treated as polymorphic (why, I don't 
know - probably to simplify pointers). In C++ though, pointers are polymorphic.

Now, if you have a variable of type A*, you could do something like this:

B* b = new B();
A* a = b;

A* takes up 32 or 64 bits in memory and holds the memory location on the heap 
where the B object is. Both pointers have the same value and point to the same 
object. The only difference is how the compiler treats each type (e.g. you can't 
call a B function on the a variable). Calling A functions on the a variable will 
call the B version if it has its own version and the function is virtual. 
However, what about this:

B b;
A a = b;

The memory layout of b and a must be known at compile time. They're laid out 
precisely on the stack. b has the size of a B object. a has the size of an A 
object. a is _exactly_ an A. It cannot be a B. So, what you get is called 
sheering. The A portions of the variable are assigned (in this case, the int and 
the float), whereas the B portions aren't assigned. a is now exactly as it would 
have been had you created it with its member variables having the same values 
that b's member variables from its A portion had. This is almost certainly _not_ 
what you wanted.

Now, because a is exactly an A, and b is exactly a B, when you go to call 
functions on them, it doesn't matter whether they're virtual or not. The type of 
the variable _is_ the type of the object. There is no polymorphism. You _need_ 
that level of indirection to get it.

Now, you could conceivably have a language where all of its objects were 
actually pointers, but they were treated as value types. So,

B b;
A a = b;

would actually be declaring

B* b;
A* a = b;

underneath the hood, except that the assignment would do a deep copy and 
allocate the appropriate meemory rather than just copying the pointer like would 
happen in a language like C++ or D. Perhaps that's what Oberon does. I have no 
idea. I have never heard of the language before, let alone used it. However, 
that's _not_ how C++, D, C#, or Java works. If you declare

B b;
A a = b;

then you are literally putting a B and an A on the stack, and assignments from a 
B to an A will cause sheering. D chose to avoid the sheering issue by making 
structs not have inheritance. This also means that they don't have a virtual 
table, which makes them more efficient. Classes have inheritance and a virtual 
table, but because they're on the heap, you don't get sheering and polymorphism 
works just fine.

So, what it comes down to is that you can't have polymorphism for a stack object 
because you know _exactly_ what its type is, and you can't have inheritance for 
a stack object without risking sheering when assignments are made (unless you 
disallow assignments from one type of object to another unless they're the exact 
same type).

So, you're never going to see inheritance for structs in D. It doesn't fit its 
memory model at all. What you get instead are templates, which can be used to 
generate the same code for different types. And that's as close as you're going 
to get for polymorphism for structs.

- Jonathan M Davis