Kinds of containers

Wed Oct 21 07:21:44 PDT 2015

On 10/21/2015 01:05 PM, Andrei Alexandrescu wrote:
> I'm finally getting the cycles to get to thinking about Design by
> Introspection containers. First off, there are several general
> categories of containers. I think D should support all properly. One
> question is which we go for in the standard library.
>
> 1. Functional containers.
>
> These are immutable; once created, neither their topology nor their
> elements may be observably changed. Manipulating a container entails
> creating an entire new container, often based on an existing container
> (e.g. append takes a container and an element and creates a whole new
> container).
>
> Internally, functional containers take advantage of common substructure
> and immutability to share actual data. The classic resource for defining
> and implementing functional containers is
> http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504.
>
> ...

I still think those should be mutable by default in order to have 
painless interchangeability with other value type containers. Why should 
corresponding ephemeral and persistent containers have different interfaces?

I assume you envision code using those to look as follows?

FunSet!int a;
a=a.with_(1);
auto b=a;
a=a.with_(2);
a=a.with_(3);
// b = {1}, a={1,2,3};

I think this should be allowed too:

FunSet!int a;
a.insert(1);
auto b=a;
a.insert(2);
a.insert(3);
// b = {1}, a={1,2,3};

One of the two versions should be automatically implemented via UFCS.

> 2. Reference containers.
>
> These have classic reference semantics (à la Java). Internally, they may be implemented either as class objects or as reference counted structs.
>
> They're by default mutable. Qualifiers should apply to them gracefully.
>
> 3. Eager value containers.
>
> These are STL-style. Somewhat surprisingly I think these are the worst of the pack; they expensively duplicate at the drop of a hat and need to be carefully passed around by reference lest performance silently drops. Nevertheless, when used as members inside other data structures value semantics might be the appropriate choice. Also, thinking of them as values often makes code simpler.
>
> By default eager value containers are mutable. They should support immutable and const meaningfully.

>
> 4. Copy-on-write containers.
>
> These combine the advantages of value and reference containers: you get
> to think of them as values, yet they're not expensive to copy. Copying
> only occurs by necessity upon the first attempt to change them.
> ...

IMO "1." ought to combine the advantages of value and reference 
containers as well, just without any expensive copying at all, even when 
updates happen.

> The disadvantage is implementations get somewhat complicated. Also, they
> are shunned in C++ because there is no proper support for COW; for
> example, COW strings have been banned starting with C++11 which is quite
> the bummer.
>
> Together with Scott Meyers, Walter figured out a way to change D to
> support COW properly. The language change consists of two attributes.
>
> =======
>
> I'll attempt to implement a few versions of each and see what they look
> like. The question here is what containers are of interest for D's
> standard library.
> ...

List:
  - forward iteration
  - bidirectional iteration

Stack:
  - basic stack
  - ordered stack [0]

Queue:
  - basic queue
  - heap

Set:
- hash set
- ordered set
- accumulating set [1]
- trie/radix tree
+ multiset versions

Map:
- hash map
- ordered map
- accumulating map [1]
- accumulating map with range update [2]
- trie/radix tree
- accumulating trie [1]
- accumulating trie with range update [2]
+ multimap versions
- array
- accumulating array [1]
- accumulating array with range update [2]
+ O(1) reset versions [3]
- rope
- accumulating rope [1]
- accumulating rope with range update [2]

[0] ordered stack: Push operations automatically pop the minimal number 
of elements off the stack prior to pushing, such as to guarantee that 
the elements on the stack remain ordered. The stack should expose a 
sorted range in order to support binary search.

[1] accumulation: The (ordered!) data structure allows fast queries for 
the result of some binary associative operations on the elements in a 
certain range. (the allowed operations are determined in advance and 
some intermediate results are automatically maintained). (for map, the 
operation would be just on the values, not on the keys.) This is usually 
quite easy support, but very useful.

[2] range update: here, the idea is that the data structure allows all 
elements in a certain range to be updated. the updates are performed 
lazily and have to be compatible with the associative operations (if any).

[3] fast reset: here the idea is that the map allows fast reset of its 
values at the cost of some small additional overhead per lookup. 
(destructors are called lazily.)