RFC on range design for D2
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Mon Sep 8 17:24:27 PDT 2008
Sergey Gromov wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org> wrote:
>> Walter, Bartosz and myself have been hard at work trying to find the
>> right abstraction for iteration. That abstraction would replace the
>> infamous opApply and would allow for external iteration, thus paving the
>> way to implementing real generic algorithms.
>
> opApply() wasn't my hero either. :) Your article really looks like
> something I'd expect to find in D. It only requires foreach support,
> and yeah, return by reference.
Indeed. Both are in the works.
>> I put together a short document for the range design. I definitely
>> missed about a million things and have been imprecise about another
>> million, so feedback would be highly appreciated. See:
>>
>> http://ssli.ee.washington.edu/~aalexand/d/tmp/std_range.html
>
> - next operation not mentioned in section 3, forward ranges.
Thanks! Fixed.
> - the union operations look... weird. Unobvious. I'm too sleepy now to
> propose anything better but I'll definitely give it a try. The rest of
> the interface seems very natural.
I agree I hadn't known what primitives would be needed when I sat down.
Clearly there was a need for some since individual iterators are not
available anymore. New ideas would be great; I suggest you validate them
by implementing some nontrivial algorithms in std.algorithm with your,
um, computational basis of choice :o).
> - what's a collection? How do you get a range out of there? Collection
> should be a range itself, like an array. But it shouldn't be destroyed
> after passing it to foreach(). How to achieve this if foreach()
> essentially uses getNext()?
These are great questions, I'm glad you asked. The way I see things, D2
ranges can be of two kinds: owned and unowned. For example D1's ranges
are all unowned:
auto a = new int[100];
...
This array is unowned because it going out of scope leaves the
underlying array in place. Now consider:
scope a = new int[100];
In this case the array is owned by the current scope. Scoped data is a
very crude form of ownership that IMHO brought more trouble than it
solved. It's a huge hole smack in the middle of everything, and we plan
to revisit it as soon as we can.
A better design would be to define collections that own their contents.
For example:
Array!(int) a(100);
This time a does own the underlying array. You'd be able to get ranges
all over it:
int[] b = a.all;
So now we have two nice notions: Arrays own the data. Ranges walk over
that data. An array can have many ranges crawling over it. But two
arrays never overlap. The contents of the array will be destroyed (BUT
NOT DELETED) when a goes out of scope.
What's the deal with destroyed but not deleted? Consider:
int[] a;
if (condition) {
Array!(int) b;
a = b.all;
}
writeln(a);
This is a misuse of the array in that a range crawling on its back has
survived the array itself. What should happen now? Looking at other
languages:
1) All Java objects are unowned, meaning the issue does not appear in
the first place, which is an advantage. The disadvantage is that scarce
resources must be carefully managed by hand in client code.
2) C++ makes the behavior undefined because it destroys data AND
recycles memory as soon as the array goes out of scope. Mayhem ensues.
We'd like:
1.5) Allow object ownership but also make the behavior of incorrect code
well-defined so it can be reasoned about, reproduced, and debugged.
That's why I think an Array going out of scope should invoke destructors
for its data, and then obliterate contents with ElementType.init. That
way, an Array!(File) will properly close all files AND put them in a
"closed" state. At the same time, the memory associated with the array
will NOT be deallocated, so a range surviving the array will never crash
unrelated code, but instead will see closed files all over.
In the case of int, there is no destructor so none of that happens.
Surviving ranges will continue looking at the contents, which is now
unowned.
So there is a difference in the way data with destructors and data
without destructors is handled. I don't like that, but this is the most
reasonably effective design I came up with so far.
About the "collection should be a range itself" mantra, I've had a
micro-epiphany. Since D's slices so nicely model at the same time arrays
and their ranges, it is very seductive to think of carrying that to
other collection types. But I got disabused of that notion as soon as I
wanted to define a less simple data structure. Consider a matrix:
auto a = BlockMatrix!(float, 3)(100, 200, 300);
defines a block contiguous matrix of three dimensions with the
respective sizes. Now a should be the matrix AND its range at the same
time. But what's "the range" of a matrix? Oops. As soon as you start to
think of it, so many darn ranges come to mind.
* flat: all elements in one shot in an arbitrary order
* dimension-wise: iterate over a given dimension
* subspace: iterate over a "slice" of the matrix with fewer dimensions
* diagonal: scan the matrix from one corner to the opposite corner
I guess there are some more. So before long I realized that the most
gainful design is this:
a) A matrix owns its stuff and is preoccupied with storage internals,
allocation, and the such.
b) The matrix defines as many range types as it wants.
c) Users use the ranges.
For example:
foreach (ref e; a.flat) e *= 1.1;
foreach (row; a.dim(0)) row[0, 0] = 0;
foreach (col; a.dim(1)) col[1, 1] *= 5;
and so on.
Inevitably naysayers will, well, naysay: D defined a built-in array, but
it also needs Array, so built-in arrays turned out to be useless. So how
is that better than C++ which has pointers and vector? Walter has long
feared such naysaying and opposed addition of user-defined array types
to Phobos. But now I am fully prepared to un-naysay the naysayers:
built-in slices are a superior building block to naked pointers. They
are in fact embodying a powerful concept, that of a range. With ranges
everything there is can be built efficiently and safely. Finally,
garbage collection helps by ensuring object ownership while preserving
well-definedness of incorrect code.
Andrei
More information about the Digitalmars-d-announce
mailing list