dcollections 1.0 and 2.0a beta released

Sun May 23 08:01:57 PDT 2010

On 2010-05-22 16:01:37 -0400, Walter Bright <newshound1 at digitalmars.com> said:

> Michel Fortin wrote:
>> What's the point of having extra indirection here?
> 
> Good question. I think the answer is:
> 
> 1. When do you ever want to copy a collection? I almost never do, 
> because copying one is an inherently expensive operation.

Whenever you need to preserve the previous state of something before 
applying some transformation on it. But I agree that the copy should be 
explicit because it is O(n), hence my suggestion of disabling implicit 
copying for containers.

Since we're at it, a reference types container sometime makes it too 
easy to just create a new reference to the same container when what you 
really want is to make a copy. I happen to have a bug of this sort to 
fix in my Objective-C program right now where a reference to a 
container leaked where it should have been a copy, causing unwanted 
mutations to the .

> 2. When you copy a collection, do you copy the container or the 
> elements in the container or both? One would have to carefully read the 
> documentation to see which it is. That increases substantially the 
> "cognitive load" of using them.

I don't see the extra cognitive load. Assuming you disable implicit 
copying of the container, you'll have to use ".dup", which will work 
exactly as an array. The way items are copied are exactly the same as 
if the container was a reference type (you call a "dup" or equivalent 
function and things get copied).

> 3. Going to lengths to make them value types, but then disabling 
> copying them because you want people to use them as reference types, 
> seems like a failure of design somewhere.

I agree in a way. But at the same time, forcing everyone to use a 
reference type when sometime a value-type would be more adequate also 
looks like a failure of design to me. To me, the best tradeoff seems to 
use a value-type because it's quite trivial to create a reference-type 
from a value type when you need it; the reverse is awkward.

> 4. That silly extra level of indirection actually isn't there. Consider 
> that even value locals are accessed via indirection: offset[ESP]. For a 
> reference collection we have: offset[EBX]. No difference (yes, EBX has 
> to be loaded, but if it is done more than once it gets cached in a 
> register by the compiler).

Have you ever worked with containers of containers? Surely yes since D 
associative arrays are one of them. So assume we want to implement our 
associative arrays like this:

	class HashTable(Key, Value) {
		Array!(Tuple!(Hash!Key, TreeSet!(Tuple!(Key, Value)))) buckets;
	}

Do you find it reasonable that the TreeSet be a reference type?

Reference-type containers would mean one indirection and one extra 
allocated block for each bucket. Then add that 'Value' could itself be 
a struct or class containing its own container, and you're stuck with a 
third unnecessary level of indirection and extra calls to the GC 
allocate containers and/or check for null. Sound quite wasteful to me. 
In addition, those extra allocations add more logic to our hash table 
and thus more chances for bugs.

Here I'm using a hash table as an example, the same problem applies to 
many other data structures, whether they are generic or specific to a 
particular problem. Container should be efficient and easy to use when 
composed one inside another. That's the greatest strengths of C++ 
value-type containers in my opinion.

> 5. Just making them all reference types zeans the documentation and use 
> become a lot simpler.

Simpler to explain, maybe. Simpler to use, I have my doubts. You're 
just moving the burden to somewhere else. A reference-type container 
requires a "new Container()" somewhere, and some protection logic 
against null. In exchange, you don't need to write 'ref' in functions 
taking containers, and can easily copy references to the container 
everywhere (sometime too easily). But the reference-type benefits 
aren't entirely lost with a value-type, because it's trivial to change 
a value-type as a reference-type.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/