general questions on reference types versus value types...

Sun Nov 30 21:51:00 PST 2014

On Mon, Dec 01, 2014 at 04:42:36AM +0000, WhatMeWorry via Digitalmars-d-learn wrote:
> 
> Is it correct to say that D reference types (classes, dynamic arrays,
> etc.) are always allocated on the heap; whereas D value types
> (structs, static arrays, etc.) are always allocated on the stack?  Or
> is this a gross oversimplification?

It's somewhat an oversimplification, as you *can* allocate by-value
types on the heap, e.g., `MyStruct* ptr = new MyStruct(...)`. But it's
rare to want to do that; usually if you need to do that, you should just
use a class instead. There's also emplace, which can place class objects
on the stack (or structs on the heap), and static array class members
are obviously allocated on the heap since there is no stack to place
them on.

> Because can't structures contain classes and classes contain
> structures?  If so, how should one thinks about these hybrid types?
> Does the outermost type take precedent over what ever it contains?

That's not a very useful way to think about it. A better way to think
about it is, value type == "allocated right here", and reference type ==
"allocated elsewhere". For example, if you have a struct:

	struct S {
		int x;
	}

Then when you declare a variable of type S in main(), the contents of S
is allocated "right here", that is, on the stack as the function
executes:

	void main() {
		S s; // allocated "right here", i.e., on the stack
	}

If you declare a member variable of type S, the contents of S are
embedded in the surrounding scope:

	class C {
		S s;	// s is part of the contents of C
	}

	struct T {
		S s;	// s is part of the contents of T
	}

You could illustrate it with a diagram:

	+---class C---+
	| +---S s---+ |
	| | int x;  | |
	| +---------+ |
	+-------------+

The member s is part of the block of memory that an instance of C
resides in.

A class, OTOH, is allocated "elsewhere", so when you declare a variable
of type C, the variable is not the object itself, but a reference
pointing to somewhere else:

	void main() {
		C c = new C();
	}

	On the stack:           On the heap:
	+------C c------+       +---class C---+
	| <reference> --------> | +---S s---+ |
	+---------------+       | | int x;  | |
	                        | +---------+ |
				+-------------+

As you can see, the variable c is actually on the stack, but it doesn't
contain the actual object. Instead, it points elsewhere -- to the heap
where the object is allocated.

Now what happens if we put a class inside a struct?

	struct U {
		int y;
		C c;
	}

	void main() {
		U u;
	}

	On the stack:              On the heap:
	+----U u------------+      +---class C---+
	| int y;            |      | +---S s---+ |
	| +---C c---------+ |      | | int x;  | |
	| | <reference> ---------->| +---------+ |
	| +---------------+ |      +-------------+
	+-------------------+

So you see, u is an interesting kind of object; it is allocated *both*
on the stack and on the heap! The 'int y' part of it is on the stack, as
well as the reference part of 'c', but the contents of 'c' is not on the
stack, but on the heap.  The stack part of U has value semantics, while
the c part has reference semantics -- for example, when you do this:

	U v = u;

v will actually contain a *copy* of 'int y', but its 'C c' member will
actually point to the *same* instance of C as u:

	On the stack:              On the heap:
	+----U u------------+      +---class C---+
	| int y;            |      | +---S s---+ |
	| +---C c---------+ |      | | int x;  | |
	| | <reference> ---------->| +---------+ |
	| +---------------+ |      +-------------+
	+-------------------+             ^
                                          |
	+----U v------------+             |
	| int y;            |             |
	| +---C c---------+ |             |
	| | <reference> ------------------+
	| +---------------+ |    
	+-------------------+

So now, modifying u.y and v.y will not interfere with each other, but
modifying u.c will affect v.c, and vice versa, because they are actually
referencing the same object on the heap.

You might be wondering why you'd want to do something like this, but
this is exactly what makes D slices so handy: a dynamic array is
actually internally implemented as a struct that has a by-value member,
and a reference member:

	struct _d_array(T) {
		size_t length;
		T* ptr;
	}

When you take a slice of an array, it just copies the struct and
modifies the .length and .ptr fields appropriately, but the two copies
of the struct references the same underlying array on the heap. The
important thing is that the value semantics of _d_array ensures that the
*original* slice is unchanged, yet the reference semantics of _d_array
lets you modify the original array through the slice. For example:

	void main() {
		int[] a = [1,2,3]; // original array
		int[] b = a[0 .. 1]; // slice of original array

		assert(a == [1,2,3]); // original slice is not modified

		b[0] = 4; // but you can modify the array contents via the new slice
		assert(a[0] == 4); // and it shows up in the original array
	}

> Can't resist this. Maybe I should just create a play code, but could a
> Structure contain a class that contained a structure that contained a
> class that...  Not sure why one would ever need to, so just a
> theoretical question.
[...]

I assure you it's not just a theoretical question. You use it everyday,
in the form of array slices, you just didn't realize it. :-)  Consider,
for example, the structure of an array of arrays:

	int[][] arr;

The variable 'arr' itself is a _d_array struct containing a by-value
field .length, and a .ptr field that references an array of _d_array
structs, each of which contain by-value .length fields (which are on the
heap, btw!) and .ptr fields that references yet another place on the
heap where the subarray contents are kept. That is to say, the .length
fields are "allocated right here" (which could be on the stack for the
arr variable itself, or on the heap when it's one of the subarray
slices), and the .ptr fields point to data which is "allocated
elsewhere" (i.e., on the heap by default, but it doesn't have to be --
you could emplace() it somewhere else, the point is that it's not
embedded into the surrounding context).

T

-- 
Век живи - век учись. А дураком помрёшь.