Proposal for "Classic DList" (CDList)

Tue Nov 6 04:18:45 PST 2012

I'd like to make a proposal for a "Classic" DList.

The MAIN reason I think the current one is inadequate is because 
of its current semantics, as discussed here:
http://forum.dlang.org/thread/gjhclwsuqyhrimdeoaec@forum.dlang.org

And documented here:
https://github.com/D-Programming-Language/phobos/pull/910

Basically, a DList is not really a container. It is a view into a 
chain of nodes. All the DLists share the same chain, but each 
DList may have a different view of the chain. This is a really 
strange semantic I've never seen anywhere else, and quite 
franckly, I'm not even sure this was the original intent 
anyways...

So I set up to write CDList (Classic DList). The 6 points that 
distinguish it from DList are:
1. Explicit Reference Semantics.
2. Optional deterministic memory model.
3. Completeness of implementation.
4. Assertive invalidation.
5. Splice (!)

To be perfectly fair, 3 & 4 can be built into our current model. 
splice too, but with more difficulty.

I'll elaborate on each point:

1. Explicit Reference Semantics.
We explicitly state that ranges point to a payload, that can be 
shared, and need to be initialized. I don't believe in lazy 
initialization, hence the "explicit". The bonus here though is 
that we can compare and assign a CDlist to null. For example:

--------
     CDList!int a;
     assert(a == null);
     CDList!int b;
     assert(a == b);
     assert(a is b);

     a = b;
     assert(a == b);
     assert(a is b);

     auto c = CDList!int();
     assert(c != null);
     assert(a != c);
     assert(a !is c);

     a = CDList!int();
     assert(a == c);
     assert(a !is c);

     a = null; //release a
--------

Long story short: Class semantics, but without the polymorphism.

2. Optional deterministic memory model.
This is the money argument. I've inserted an optional template 
parameter that defines if allocation should be done by the GC, or 
manually. They have their pros and cons:
-GC: + Direct access to node content references (front and back 
return by ref).
-GC: + references to nodes are never invalidated (although ranges 
can be).
-Malloc: + Nodes are released as soon as they are removed from 
the range.
-Malloc: - No reference access to the node contents
-Malloc: - "Remove" and "~this()" are not BIGOH(1) anymore (since 
it triggers the destruction of the nodes).

Here is an example of the Malloc model in action:
--------
     struct Per//son
     {
         int* p;
         this(int i){p = [i].ptr; ++pop;}
         this(this){p = [*p].ptr; ++pop;}
         ~this(){if(p){p = null;--pop;}}
     }
     alias CDList!(Per, CDListGCAllocation.no) Peoples;
     {
         Peoples.Range rr;
         {
             auto a = Peoples(Per(1), Per(5), Per(6));
             auto b = Peoples(Per(3), Per(4), Per(9));
             auto c = Peoples();
             assert(pop == 6);

             a.removeBack();
             assert(pop == 5);

             auto r = a[].take(2);
             a.spliceAfter(r, b[].take(2));
             assert(pop == 5);

             b.clear();
             assert(pop == 4);

             rr = c.splice(a[].take(2));
         }
         assert(pop == 2); //still held by rr
     }
     assert(pop == 0);
--------

With a little before taste of "splice" :)
As you can see, each person gets destroyed as soon as their node 
leaves the scope. Interesting note: An extracted range will 
single handedly prevent the destruction of a CDList's content, so 
the behavior is safe too.

3. Completeness of implementation.
This just means that *any* function that takes a "Range" as an 
argument will also work with a "Take!Range".
Also, *any* function that takes "stuff" will accept stuff as:
-a value,
-an indiscriminate list of values
-a range.

So for example: "d.insertAfter(d[].take(3), 1, 2, 3)" is 
perfectly legal (and super convenient).

In all fairness, this could be implemented back into our DList.

4. Assertive invalidation.
You'd be amazed the shit DList gets away with. You know all that 
"stableRemove" it has? Yeah, not stable at all. What's more, when 
you *do* invalidate a range, chances are you'll never even notice 
it. My implementation invalidates the shit out each and every 
nodes individually (in version(assert) or course), and catches 
any invalid code almost instantaneously. That may not sound like 
much, but manipulating lists is tricky business.

5. Splice (!)

That's right! What's a DList without a splice? I think I've 
antroduced a semantic that would be to Andrei's taste: 
"spliceFront", "spliceBack", "spliceBefore" and "spliceAfter". 
Simple as that. The ricky business with splice though is that the 
input range will not really be invalidated, but WILL change 
owner. Because of this, "spliceXXX" will:
"return a range spanning the spliced nodes, that is owned by the 
current range". Here is a simple example:
--------
     auto a = IntList(1, 2, 6);
     auto b = IntList(3, 4, 5);
     auto r = a[]; //r is owned by a
     auto r = b.splice(r); //r is now owned by b;
--------

Arguments can, Of course, be either a Range or a Take!Range. 
Operations are o(1) for Range. In any case, nodes are never 
constructed of course.

here it is in more exciteing action:
--------
     auto a = IntList(1, 2, 5);
     auto b = IntList(3, 4, 9);
     auto r = a.spliceAfter(a[].take(2), b[].take(2));
     assert(a[].equal([1, 2, 3, 4, 5]));
     assert(b[].equal([9]));
     assert(r.equal([3, 4]));
--------

Self splice is, of course, possible. I love the expressiveness of 
this one liner:
--------
     auto a = IntList(3, 1, 2, 4);
     a.spliceFront(a[].drop(1).take(2));
     assert(a[].equal([1, 2, 3, 4]));
--------

-------------------------------------------------
The implementation is in this here:
https://github.com/monarchdodra/phobos/commits/DList2/std/container.d
https://github.com/monarchdodra/phobos/commit/2292071a862ecb0b1f9b77d1413c5d9b9c7b24e1#std/container.d

What do you think? I'm not sure where to take this: I can't just 
change DList into CDList, because, well, breakage. If we decide 
to change it to reference semantics, but with lazy 
initialization, it can be done though...

Also, container duplication is a bad idea. Still as it stands, 
DList is an EXTREMELLY tricky container to use...

So what to do?