range practicle use

Thu Dec 30 09:19:33 PST 2010

On 12/30/10 6:41 AM, spir wrote:
> Hello,
>
>
> In the course of a project (1) 2 partner D programmers and myself are
> currently implementing, we faced 2 issues which prevented us using a
> range interface as planned. We initially intended to do it for better
> compliance with D's coming new style, and nice inter-relation with
> Phobos modules like std.algorithm. Instead, we use opApply to
> implement traversal; which does the job. This post's purpose is to
> expose these problems to help and making range interfaces better
> usable, or simply usable, in various practical cases. (Opinions are
> mine --I don't known my partners' exact position, except that they
> indeed agree these topics plainly prevent us using ranges.)
[snip]

Thanks very much for taking the time to document your experience. This 
is very useful.

> -1- textual output
> Functions like writeln silently use the formatValue set of templates
in std.format, to get a textual form for the element to be output.
Template constraints determine which version is used for a given kind of
element. As implemented now:
> * There are formatValue variants for elements that implement a range
interface (which produce an array-like form).
> * Constraint selection for a range interface has high precedence;
especially it shortcuts any toString function explicitely implemented by
the programmer.
>
> This prevents a user to define any adequate output format for the
elements of a custom type, if the type also happens to implement a range
interface. A critical bug, and an absolute blocker for us.

Seeing "absolute blocker" here makes me think: is there /absolutely no 
way/ to get that particular task done in the D programming language? I 
might be off here because I don't have details, but my knee-jerk 
reaction is:

struct MyRange { ... }
void print(MyRange r) { ... }

Of course it would be great if you could use the predefined 
write/writeln functions, and we should improve them, but I have a hard 
time considering this an absolute blocker. You write a function once and 
then you call it. I do that in my C++ code all the time. Iostreams are 
terrible (to be euphemistic), but they don't prevent me and others from 
using C++.

> Actually, this cannot even work in numerous cases. In our project, the range's ElementType (as returned by opindex&  front) would happen to be identical to the range type itself. In this case, the formatting algorithm via a range runs into an infinite loop.

Ha, glad you mention that. When I first designed ranges I had this fear 
in a couple of places: if a range is its own element type, then there's 
a problem. I deferred my solution about that to later. Apparently that 
later has come now.

> It is instead necessary to use the programmer-defined format in toString (or later writeTo).
> In the example below, trying to write() an element yields an infinite series of "[[[...", in trying to print it like an array.

Yep, yep, great feedback. This is a fixed point in the iteration 
typeof(r) -> typeof(r.front), which is not difficult to detect. One 
question would be - what to do when we _do_ detect that?

(One thing that should be kept in mind is that ranges are printed the 
way they are simply because I had to choose _something_. There is 
definitely a lot of room for improvement there in many aspects.)

> Note that it is a common situation: for instance, (textual) string types like ones in high-level languages do not expose a separate element type (character); instead, a character is just a singleton of the type. Even more commonly, list, tree, graph nodes usually are of the same type as the container (or the root/container is of a sub-type implementing global methods).

This is interesting, but I find it difficult to imagine concrete cases 
because in all my code the iteration strategy is encoded in the range's 
type. For example, splitter() returns a range that iterates a string by 
words. Arguably the input and output have the same type, but the type of 
the range is _not_ string, it's Splitter!string or whatever. Similarly, 
for a self-referential data structure, the range type would be e.g. 
InOrder!TreeNode, PostOrder!TreeNode etc. I guess there are legitimate 
cases in which the TreeNode has front() that returns itself and so on, 
so we need to address this.

> Another problem is that if the type is a class (as opposed to a struct), then defining a range interface introduces a template-selection conflict with the standard class case: as of now, they are not mutually exclusive, leading to compile-time error.

I don't understand this, could you please give a bit of detail?

> Anyway, what is the need for a range-specific output format? And why should it exactly look like an array (misleading)?

Because I needed to put something there and couldn't think of something 
better...

> Structs and classes also implementing ranges can define toString (later writeTo) for an accurate format; and in case this would be the right choice, reproducing an array-like format is a few lines of code.

Agreed. writeTo() with a delegate taking const(char)[] has been 
discussed and I think is a good solution.

> -2- indexed iteration
>
> It seems there is no way to directly define indexed iteration using ranges, like commonly needed by:
>      foreach(index,element ; collection) {...}

Aha! The time of reckoning has come :o).

> A possible but inadequate workaround would be to define a tool type for this:
>      struct TraversalPair (Element) {uint index ; Element element;}
> Then define the range's element output routines (opIndex, front, back) to return pairs instead of elements; and use this like:
>      foreach (pair ; collection) {actWith(pair.element, pair.index);}
> But this requires a client programmer to know this particularity; and anyway does not fit D common style and practice, I guess.

Clearly that's an annoying limitation of ranges when compared against 
opApply, and we need to fix that. That being said, I see your example 
uses the index as a simple counter 0, 1, 2, ... so again putting my 
"could this or could this not be done in the D programming language?" I 
can't stop thinking about this:

size_t index = 0;
foreach (e; collection) {
     ...
     ++index;
}

You need to mind the presence of "continue" and the larger scope of 
"index", but these are simple matters.

Annoying? Definitely. Makes ranges unusable? Perhaps not as much.

> How to solve this practically? I would be happy with the above workaround if it became a commonly used solution, supported by the stdlib and if necessary by the core language. This may scale by defining the tool type as a superclass instead, allowing various variants, possibly with more elements.
> Maybe an alternative is to use tuples: allow variants of opIndex, front, and back, to return (index,element) tuples and let the compiler use these overloads when the client code requests 2 (or more) returned values.
> The first solution is more explicite, and possibly general; the second may fit common practice better if/when tuples become widely used (a rather far&  hypothetical future ;-) ?).

I'd think tuples are the simplest solution and the basis of the language 
change that fixes this insufficiency.

First, if all you want is an index you can write this:

foreach (e; zip(iota(r.length), r))
{
     // e[0] is 0, 1, 2, ...
     // e[1] is the current range element
}

If the range doesn't have a length you can use iota with the largest 
number - iteration will stop at the shortest:

foreach (e; zip(iota(size_t.max), r))
{
     // e[0] is 0, 1, 2, ...
     // e[1] is the current range element
}

If such uses of iota become common we could make iota() with no 
arguments just go 0, 1, 2,..., size_t.max, 0, 1, 2,...

Second, if you want to do some cleverer stuff than just 0, 1, 2,... you 
could always have r.front yield a tuple:

struct MyRange
{
     @property Tuple!(string, double) front() { ... }
     ...
}

foreach (e; r)
{
     // e[0] is the current string
     // e[1] is the current double
}

This is also the basis of a language solution. Currently the language 
lowers foreach for ranges down to a loop that uses front(), popFront(), 
and empty(). We discussed in this group a while ago that foreach with 
multiple iteration elements (e.g. a, b, c) could be lowered by binding a 
to r.front[0], b to r.front[1], and c to r.front[2 .. $]. This is 
compatible with the current lowering and allows convenient usage with 
ranges that have either tuples or arrays as their front.

Thanks again for this report. We definitely need to work on all of the 
above. Let me ask you this - did the project actually need to use 
algorithms in std.algorithm? Based on my experience, my assumption is 
that you didn't need the more complex algorithms (e.g. sort or 
bringToFront) at all, and that you didn't have intensive use for the 
simpler algorithms (map, reduce) either. This is because if you did, 
ranges would have had enough strategic advantage for you, and 
consequently you would have been much more inclined to work around the 
issues mentioned above. As such, it's possible that you started with 
ranges because they sounded nice to have but didn't quite find a solid 
need for their advantages over e.g. opApply. Please let me know - thanks.

Andrei