Classes or stucts :: Newbie
Jonathan M Davis
jmdavisProg at gmx.com
Mon Dec 20 03:11:49 PST 2010
On Monday 20 December 2010 01:52:58 spir wrote:
> On Mon, 20 Dec 2010 01:29:13 -0800
>
> Jonathan M Davis <jmdavisProg at gmx.com> wrote:
> > > For me, the important difference is that classes are referenced, while
> > > structs are plain values. This is a semantic distinction of highest
> > > importance. I would like structs to be subtype-able and to implement
> > > (runtime-type-based) polymorphism.
> >
> > Except that contradicts the facts that they're value types. You can't
> > have a type which has polymorphism and is a value type. By its very
> > nature, polymorphism requires you to deal with a reference.
>
> Can you expand on this?
>
> At least Oberon has value structs ("records") with inheritance and
> polyporphism; I guess the turbo Pascal OO model was of that kind, too
> (unsure) -- at least the version implemented in freepascal seems to work
> fine that way. And probably loads of less known PLs provide such a
> feature. D structs could as well IIUC: I do not see the relation with
> instances beeing implicitely referenced. (Except that they must be passed
> by ref to "member functions" they are the receiver of, but this is true
> for any kind of OO, including present D structs.)
>
> (I guess we have very different notions of "reference", as shown by
> previous threads.)
Okay. This can get pretty complicated, so I'm likely to screw up on some of the
details, but this should give you a basic idea of what's going on.
In essentially any C-based language, when you declare an integer on the stack
like so:
int a = 2;
you set aside a portion of the stack which is the exact size of an int
(typically 32 bits, but that will depend on the language). If you declare a
pointer,
int* a;
then you're setting aside a portion of the stack the size of a pointer (32 bits
on a 32 bit machine and 64 bits on a 64 bit machine). That variable then holds
an address - typically to somewhere on the heap, though it could be to an
address on the stack somewhere. In the case of int*, the address pointed to will
refer to a 32-bit block of memory which holds an int.
If you have a struct or a class that you put on the stack. Say,
class A
{
int a;
float b;
}
then you're setting aside exactly as much space as that type requires to hold
itself. At minimum, that will be the total size of its member variables (in this
case an int and a float, so probably a total of 64 bits), but it often will
include extra padding to align the variables along appropriate boundaries for
the sake of efficiency, and depending on the language, it could have extra type
information. If the class has a virtual table (which it will if it has virtual
functions, which in most any language other than C++ would mean that it
definitely has a virtual table), then that would be part of the space required
for the class as well (virtual functions are polymorphic; when you call a
virtual function, it calls the version of the function for the actual type that
an object is rather than the pointer or reference that you're using to refer to
the object; when a non-virtual function function is called, then the version of
the function which the pointer or reference is is used; all class functions are
virtual in D unless the compiler determines that they don't have to be and
optimizes it out (typically because they're final); struct functions and stand-
alone functions are never virtual). The exact memory layout of a type _must_ be
known at compile time. The exact amount of space required is then known, so that
the stack layout can be done appropriately.
If you're dealing with a pointer, then the exact memory layout of the memory
being pointed to needs to be known when that memory is initialized, but the
pointer doesn't necessarily need to know it. This means that you can have a
pointer of one type point to a variable of another type. Now, assuming that
you're not subverting the type system (e.g. my casting int* to float*), you're
dealing with inheritance. For instance, you have
class B : A
{
bool c;
}
and a variable of type A*. That pointer could point to an object which is
exactly of type A, or it could point to any subtype of A. B is derived from A,
so the object could be a B. As long as the functions are virtual, you can have
polymorphic functions by having the virtual table used to call the version of
the function for the type that the object actually is rather than the type that
the pointer is.
References are essentially the same as pointers (though they may have some extra
information with them, making them a bit bigger than a pointer would be in terms
of the amount of space required on the stack). However, in the case of D,
pointers are _not_ treated as polymorphic (regardless of whether a function is
virtual or not), whereas references _are_ treated as polymorphic (why, I don't
know - probably to simplify pointers). In C++ though, pointers are polymorphic.
Now, if you have a variable of type A*, you could do something like this:
B* b = new B();
A* a = b;
A* takes up 32 or 64 bits in memory and holds the memory location on the heap
where the B object is. Both pointers have the same value and point to the same
object. The only difference is how the compiler treats each type (e.g. you can't
call a B function on the a variable). Calling A functions on the a variable will
call the B version if it has its own version and the function is virtual.
However, what about this:
B b;
A a = b;
The memory layout of b and a must be known at compile time. They're laid out
precisely on the stack. b has the size of a B object. a has the size of an A
object. a is _exactly_ an A. It cannot be a B. So, what you get is called
sheering. The A portions of the variable are assigned (in this case, the int and
the float), whereas the B portions aren't assigned. a is now exactly as it would
have been had you created it with its member variables having the same values
that b's member variables from its A portion had. This is almost certainly _not_
what you wanted.
Now, because a is exactly an A, and b is exactly a B, when you go to call
functions on them, it doesn't matter whether they're virtual or not. The type of
the variable _is_ the type of the object. There is no polymorphism. You _need_
that level of indirection to get it.
Now, you could conceivably have a language where all of its objects were
actually pointers, but they were treated as value types. So,
B b;
A a = b;
would actually be declaring
B* b;
A* a = b;
underneath the hood, except that the assignment would do a deep copy and
allocate the appropriate meemory rather than just copying the pointer like would
happen in a language like C++ or D. Perhaps that's what Oberon does. I have no
idea. I have never heard of the language before, let alone used it. However,
that's _not_ how C++, D, C#, or Java works. If you declare
B b;
A a = b;
then you are literally putting a B and an A on the stack, and assignments from a
B to an A will cause sheering. D chose to avoid the sheering issue by making
structs not have inheritance. This also means that they don't have a virtual
table, which makes them more efficient. Classes have inheritance and a virtual
table, but because they're on the heap, you don't get sheering and polymorphism
works just fine.
So, what it comes down to is that you can't have polymorphism for a stack object
because you know _exactly_ what its type is, and you can't have inheritance for
a stack object without risking sheering when assignments are made (unless you
disallow assignments from one type of object to another unless they're the exact
same type).
So, you're never going to see inheritance for structs in D. It doesn't fit its
memory model at all. What you get instead are templates, which can be used to
generate the same code for different types. And that's as close as you're going
to get for polymorphism for structs.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list