std.container update

Thu May 27 07:23:24 PDT 2010

On 05/27/2010 08:23 AM, Steven Schveighoffer wrote:

You're making a number of great points in the four posts starting with
this. Since they sort of augment one another, let me quote and answer
them all here.

> On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> I just implemented a singly-linked list type to illustrate the
>> container abstraction.
>>
>> http://erdani.com/d/phobos/std_container.html
>> http://erdani.com/d/phobos/container.d
>>
>> Please let me know of how you find it. There are a few refinements
>>  and ancillary functions to define, but those are pretty clear
>> given the rest.
>
> Well, one thing I don't like about it is that you cannot do O(1)
> insertion or removal. insertBefore requires O(n) complexity and
> insertAfter requires O(m) complexity. Removal will require O(n)
> complexity, even though it doesn't exist right now.
>
> Fast removal and insertion are key to having a linked list. I'd
> suggest modifying the range so it uses a Node ** instead, which
> points to the _next element of the previous node of the one currently
> being iterated, or _root if it's the first node in the list.
>
> -Steve

The performance profile of SList is what one would expect for a
straightforward singly-linked list implemented with nodes, pointer to
root, and the such. You are making a good point that adding one 1-2
extra indirections would lead to improvements of certain primitives.
There have been various experiments with STL-like slist implementations
that dwell on such ideas. See e.g.
http://manpages.ubuntu.com/manpages/lucid/man3/std::forward_list.3cxx.html
with functions like before_begin() in addition to begin(), and also
insert_after() instead of insert().

It has become clear to STL implementors that adding hidden indirections
puts forward_list at a disadvantage compared to the straightforward
lists against which they compete. I want to avoid that by abiding to the
straight storage strategy and deal with its consequences.

Second post:

> In fact, I'd say that it would be critical to split the insert and
> remove functions to slow (O(n) or greater) versions and fast( O(lg(n)
> or less)) functions.  Some algorithms may depend on fast insertion
> and removal, such as mergesort on a linked list.

This is a good idea, which generalizes linearRemove(). I'd suspected if
there's one place to distinguish linear from better, then there ought to
be more.

 From here we have a fork in the road:

a) Tighten complexity requirements for e.g. insertBefore(Range, Stuff)
and let containers that can't implement them just not have them.

b) Define linearInsertBefore(Range, Stuff) etc.

I'm leaning towards (a) because a list can insert virtually anywhere by
only using insertAfter(Take!Range, Stuff) (which I haven't defined yet
but I will, and which also takes care of the complexity issue) and 
insertFront(Stuff).

Third post:

> Actualy mergesort could not be implemented without some sort of way
> to restructure the list without allocating new nodes, requiring
> knowledge of the link structure.  I'm not sure it could be
> generic... I'll think about this a bit.

You are entirely right. A mergesort that relies on link shuffling needs
to have intimate knowledge of the node structure. It would most likely
be a member of the list.

Fourth post:

> I have another general question in general about these collections.
>
> Ranges fix a very bad problem with iterators -- non-matching begin
> and end iterators.  For example, passing a begin from one list and
> and end from another.

Right.

> However, all your range functions such as remove, insertBefore, etc.
> are called via the container type, using a range as an argument.  But
> there can be no static verification that a range actually is part of
> a given container.  Should there be some requirement that a container
> validates that a range is part of the container before performing the
> operation?  This is the case for dcollections.
>
> In your current implementation of SList, verification seems
> incidental for insertBefore because you must find the previous node
> from the root anyways (and that will throw on enforce if it cannot
> find it).  But insertAfter does not, so something like this will
> compile, run, and result in strange behavior:
>
> auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this
> correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8,
> 9, 10);
>
> auto r1 = sl1[]; r1.popFront(); r1.popFront();
>
> auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts
> into sl1

It's a good question.

My current view of the matter is to maximize library flexibility and 
versatility. Following the adage that you can build safe on fast but not 
the other way around, I'd go with the following philosophy: checking of 
range membership to their collections should be made only to the extent 
of ensuring memory safety. Beyond that, if efficiency is in tension with 
verifiability (as is the case with your example above), leave it to the 
programmer or the supra-structure to correctly pair collections with 
their ranges.

Sounds good?

Andrei