RFC on range design for D2

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Tue Sep 9 03:50:06 PDT 2008


Sergey Gromov wrote:
> Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org> wrote:
>> Sergey Gromov wrote:
>>> - what's a collection?  How do you get a range out of there?  Collection 
>>> should be a range itself, like an array.  But it shouldn't be destroyed 
>>> after passing it to foreach().  How to achieve this if foreach() 
>>> essentially uses getNext()?
>> These are great questions, I'm glad you asked. The way I see things, D2 
>> ranges can be of two kinds: owned and unowned. For example D1's ranges 
>> are all unowned:
>>
>> [snip]
>>
>> A better design would be to define collections that own their contents. 
>> For example:
>>
>> Array!(int) a(100);
>>
>> This time a does own the underlying array. You'd be able to get ranges 
>> all over it:
>>
>> int[] b = a.all;
> 
> I really don't like to have basic language constructs implemented as 
> templates.  It's like Tuple!() which is sorta basic type but requires 
> template trickery to really work with it.

Well I guess we disagree on a number of issues here. The problem with
"sorta basic" types is that the list could go on forever. I'd rather use
a language that allows creation of good types from a small core, instead
of one that tries to supplant all sorta basic types it could think of.

>> So now we have two nice notions: Arrays own the data. Ranges walk over 
>> that data. An array can have many ranges crawling over it. But two 
>> arrays never overlap. The contents of the array will be destroyed (BUT 
>> NOT DELETED) when a goes out of scope.
> 
> This invalidates the idea of safe manipulations with data no matter 
> where you've got that data from.

Manipulation remains typesafe. The problem is that sometimes we want to
ensure timely termination of certain resources.

>> About the "collection should be a range itself" mantra, I've had a 
>> micro-epiphany. Since D's slices so nicely model at the same time arrays 
>> and their ranges, it is very seductive to think of carrying that to 
>> other collection types. But I got disabused of that notion as soon as I 
>> wanted to define a less simple data structure. Consider a matrix:
>>
>> auto a = BlockMatrix!(float, 3)(100, 200, 300);
>>
>> defines a block contiguous matrix of three dimensions with the 
>> respective sizes. Now a should be the matrix AND its range at the same 
>> time. But what's "the range" of a matrix? Oops. As soon as you start to 
>> think of it, so many darn ranges come to mind.
> 
> If you cannot think of a natural default range for your collection---
> well, it's your decision not to implement range interface for it.  But 
> if it does have a natural iteration semantics, it should be possible to 
> implement:

It's not that I can't think of one. It's that I think of too many.

> auto a = new File("name");
> auto b = new TreeSet!(char);
> 
> foreach(ch; a)
>   b.insert(ch);
> 
> foreach(ch; b)
>     writeln("unique char ", ch);
> 
> Here is the problem.  First foreach() naturally and expectedly changes 
> the state of an object a.  Second foreach() naturally and expectedly 
> does not make changes to object b.
> 
> Solution:
> 
> File is an Input range in your notion.  It supports isEmpty() and 
> getNext(), it is non-copyable (but, note, referenceable).

You left a crucial detail out. What does getNext() return?

In the new std.stdio design, a File is preoccupied with opening,
closing, and transferring data for the underlying file. On top of it
several input ranges can be constructed - that read lines, blocks, parse
text, format text, and so on. (One thing I want is to allow
std.algorithm to work with I/O easily and naturally.)

I fully understand there can be so many design choices in handling all
this stuff, it's not even funny. I can't get excited about an equivalent
solution that to me has no obvious advantages. I can even less get
excited about a solution that I have objections with. That doesn't mean
someone else can't get excited over it, and probably rightly so.

> TreeSet is a Collection, which you don't discuss.  It implements opSlice
> () without arguments, which is required and sufficient to define a 
> collection.  opSlice() must return a range that, at least, supports 
> input range operations.
> 
> foreach() checks if a passed object implements opSlice() so that it can 
> iterate non-destructively. If no opSlice() is provided, it falls back to 
> getNext().
> 
>> a) A matrix owns its stuff and is preoccupied with storage internals, 
>> allocation, and the such.
>>
>> b) The matrix defines as many range types as it wants.
>>
>> c) Users use the ranges.
> 
> No problem.  The matrix lives as long as a range refers to it.  As 
> expected.

I, too, wish reference counting is a solution to everything.


Andrei


More information about the Digitalmars-d-announce mailing list