RFC on range design for D2

Mon Sep 8 14:50:54 PDT 2008

Hello,

Walter, Bartosz and myself have been hard at work trying to find the 
right abstraction for iteration. That abstraction would replace the 
infamous opApply and would allow for external iteration, thus paving the 
way to implementing real generic algorithms.

We considered an STL-style container/iterator design. Containers would 
use the newfangled value semantics to enforce ownership of their 
contents. Iterators would span containers in various ways.

The main problem with that approach was integrating built-in arrays into 
the design. STL's iterators are generalized pointers; D's built-in 
arrays are, however, not pointers, they are "pairs of pointers" that 
cover contiguous ranges in memory. Most people who've used D gained the 
intuition that slices are superior to pointers in many ways, such as 
easier checking for validity, higher-level compact primitives, 
streamlined and safe interface. However, if STL iterators are 
generalized pointers, what is the corresponding generalization of D's 
slices? Intuitively that generalization should also be superior to 
iterators.

In a related development, the Boost C++ library has defined ranges as 
pairs of two iterators and implemented a series of wrappers that accept 
ranges and forward their iterators to STL functions. The main outcome of 
Boost ranges been to decrease the verboseness and perils of naked 
iterator manipulation (see 
http://www.boost.org/doc/libs/1_36_0/libs/range/doc/intro.html). So a 
C++ application using Boost could avail itself of containers, ranges, 
and iterators. The Boost notion of range is very close to a 
generalization of D's slice.

We have considered that design too, but that raised a nagging question. 
In most slice-based D programming, using bare pointers is not necessary. 
Could then there be a way to use _only_ ranges and eliminate iterators 
altogether? A container/range design would be much simpler than one also 
exposing iterators.

All these questions aside, there are several other imperfections in the 
STL, many caused by the underlying language. For example STL is 
incapable of distinguishing between input/output iterators and forward 
iterators. This is because C++ cannot reasonably implement a type with 
destructive copy semantics, which is what would be needed to make said 
distinction. We wanted the Phobos design to provide appropriate answers 
to such questions, too. This would be useful particularly because it 
would allow implementation of true and efficient I/O integrated with 
iteration. STL has made an attempt at that, but istream_iterator and 
ostream_iterator are, with all due respect, a joke that builds on 
another joke, the iostreams.

After much thought and discussions among Walter, Bartosz and myself, I 
defined a range design and reimplemented all of std.algorithm and much 
of std.stdio in terms of ranges alone. This is quite a thorough test 
because the algorithms are diverse and stress-test the expressiveness 
and efficiency of the range design. Along the way I made the interesting 
realization that certain union/difference operations are needed as 
primitives for ranges. There are also a few bugs in the compiler and 
some needed language enhancements (e.g. returning a reference from a 
function); Walter is committed to implement them.

I put together a short document for the range design. I definitely 
missed about a million things and have been imprecise about another 
million, so feedback would be highly appreciated. See:

http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html

Andrei