RFC on range design for D2

Mon Sep 8 16:16:18 PDT 2008

On Tue, 09 Sep 2008 01:50:54 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Hello,
>
>
> Walter, Bartosz and myself have been hard at work trying to find the  
> right abstraction for iteration. That abstraction would replace the  
> infamous opApply and would allow for external iteration, thus paving the  
> way to implementing real generic algorithms.
>
> We considered an STL-style container/iterator design. Containers would  
> use the newfangled value semantics to enforce ownership of their  
> contents. Iterators would span containers in various ways.
>
> The main problem with that approach was integrating built-in arrays into  
> the design. STL's iterators are generalized pointers; D's built-in  
> arrays are, however, not pointers, they are "pairs of pointers" that  
> cover contiguous ranges in memory. Most people who've used D gained the  
> intuition that slices are superior to pointers in many ways, such as  
> easier checking for validity, higher-level compact primitives,  
> streamlined and safe interface. However, if STL iterators are  
> generalized pointers, what is the corresponding generalization of D's  
> slices? Intuitively that generalization should also be superior to  
> iterators.
>
> In a related development, the Boost C++ library has defined ranges as  
> pairs of two iterators and implemented a series of wrappers that accept  
> ranges and forward their iterators to STL functions. The main outcome of  
> Boost ranges been to decrease the verboseness and perils of naked  
> iterator manipulation (see  
> http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a  
> C++ application using Boost could avail itself of containers, ranges,  
> and iterators. The Boost notion of range is very close to a  
> generalization of D's slice.
>
> We have considered that design too, but that raised a nagging question.  
> In most slice-based D programming, using bare pointers is not necessary.  
> Could then there be a way to use _only_ ranges and eliminate iterators  
> altogether? A container/range design would be much simpler than one also  
> exposing iterators.
>
> All these questions aside, there are several other imperfections in the  
> STL, many caused by the underlying language. For example STL is  
> incapable of distinguishing between input/output iterators and forward  
> iterators. This is because C++ cannot reasonably implement a type with  
> destructive copy semantics, which is what would be needed to make said  
> distinction. We wanted the Phobos design to provide appropriate answers  
> to such questions, too. This would be useful particularly because it  
> would allow implementation of true and efficient I/O integrated with  
> iteration. STL has made an attempt at that, but istream_iterator and  
> ostream_iterator are, with all due respect, a joke that builds on  
> another joke, the iostreams.
>
> After much thought and discussions among Walter, Bartosz and myself, I  
> defined a range design and reimplemented all of std.algorithm and much  
> of std.stdio in terms of ranges alone. This is quite a thorough test  
> because the algorithms are diverse and stress-test the expressiveness  
> and efficiency of the range design. Along the way I made the interesting  
> realization that certain union/difference operations are needed as  
> primitives for ranges. There are also a few bugs in the compiler and  
> some needed language enhancements (e.g. returning a reference from a  
> function); Walter is committed to implement them.
>
> I put together a short document for the range design. I definitely  
> missed about a million things and have been imprecise about another  
> million, so feedback would be highly appreciated. See:
>
> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
>
> Andrei

1) There is a typo:

// Copies a range to another
void copy(R1, R2)(R1 src, R2 tgt)
{
     while (!src.isEmpty)
     {
         tgt.putNext(r.getNext);  // should be tgt.putNext(src.getNext);
     }
}

2) R.next and R.pop could have better names. I mean, they represent  
similar operations yet names are so different.

3) Walter mentioned that built-in array could be re-implemented using a  
pair of pointers instead of ptr+length. Will it ever get a green light? It  
fits range concept much better.

4) We need some way of supporting dollar notation in user containers. The  
hack of using __dollar is bad (although it works).

5) I don't quite like names left and right! :) I think they should  
represent limits (pointers to begin and end, in case of array) rather that  
values. In this case, built-in arrays could be implemented as follows:

struct Array(T)
{
     T* left;
     T* right;
     size_t length() { return right-left; }
     ref T opIndex(size_t index) { return left[index]; }
     // etc
}

The rationale behind having access to range limits is to allow operations  
on them. For example,
R.left-=n;

could be used instead of
foreach(i; 0..n) {
     R.pop();
}

which is more efficient in many cases.

Other that that - great, I like it.