range practicle use

Thu Dec 30 04:41:18 PST 2010

Hello,

In the course of a project (1) 2 partner D programmers and myself are currently implementing, we faced 2 issues which prevented us using a range interface as planned. We initially intended to do it for better compliance with D's coming new style, and nice inter-relation with Phobos modules like std.algorithm. Instead, we use opApply to implement traversal; which does the job.
This post's purpose is to expose these problems to help and making range interfaces better usable, or simply usable, in various practical cases. (Opinions are mine --I don't known my partners' exact position, except that they indeed agree these topics plainly prevent us using ranges.)
To illustrate the situation, please see an artificial example coded below (the real case would require explaining several irrelevant points).

-1- textual output
Functions like writeln silently use the formatValue set of templates in std.format, to get a textual form for the element to be output. Template constraints determine which version is used for a given kind of element. As implemented now:
* There are formatValue variants for elements that implement a range interface (which produce an array-like form).
* Constraint selection for a range interface has high precedence; especially it shortcuts any toString function explicitely implemented by the programmer. 

This prevents a user to define any adequate output format for the elements of a custom type, if the type also happens to implement a range interface. A critical bug, and an absolute blocker for us.

Actually, this cannot even work in numerous cases. In our project, the range's ElementType (as returned by opindex & front) would happen to be identical to the range type itself. In this case, the formatting algorithm via a range runs into an infinite loop. It is instead necessary to use the programmer-defined format in toString (or later writeTo).
In the example below, trying to write() an element yields an infinite series of "[[[...", in trying to print it like an array.

Note that it is a common situation: for instance, (textual) string types like ones in high-level languages do not expose a separate element type (character); instead, a character is just a singleton of the type. Even more commonly, list, tree, graph nodes usually are of the same type as the container (or the root/container is of a sub-type implementing global methods).

Another problem is that if the type is a class (as opposed to a struct), then defining a range interface introduces a template-selection conflict with the standard class case: as of now, they are not mutually exclusive, leading to compile-time error.

Anyway, what is the need for a range-specific output format? And why should it exactly look like an array (misleading)? Structs and classes also implementing ranges can define toString (later writeTo) for an accurate format; and in case this would be the right choice, reproducing an array-like format is a few lines of code. The case of ranges cannot compare to the one of arrays (which are _only_ arrays, which form a much more narrow kind of objects, and for which one cannot define any format). For all these reasons, I would happily get rid of these range-specific formats, that introduce complication, issues and bugs.
Or: these issues must be solved by complicating constraints (in particular, the range case must check no toString is defined).

[you can comment on issue http://d.puremagic.com/issues/show_bug.cgi?id=5354]

-2- indexed iteration

It seems there is no way to directly define indexed iteration using ranges, like commonly needed by:
    foreach(index,element ; collection) {...}

A possible but inadequate workaround would be to define a tool type for this:
    struct TraversalPair (Element) {uint index ; Element element;}
Then define the range's element output routines (opIndex, front, back) to return pairs instead of elements; and use this like:
    foreach (pair ; collection) {actWith(pair.element, pair.index);}
But this requires a client programmer to know this particularity; and anyway does not fit D common style and practice, I guess.

How to solve this practically? I would be happy with the above workaround if it became a commonly used solution, supported by the stdlib and if necessary by the core language. This may scale by defining the tool type as a superclass instead, allowing various variants, possibly with more elements.
Maybe an alternative is to use tuples: allow variants of opIndex, front, and back, to return (index,element) tuples and let the compiler use these overloads when the client code requests 2 (or more) returned values.
The first solution is more explicite, and possibly general; the second may fit common practice better if/when tuples become widely used (a rather far & hypothetical future ;-) ?).

Thank you,
Denis

=== example using range ============================================
struct Text {
    uint length;
    private char[] chars;
    private uint index;     // for input range
    // construction
    this (string s=null) {
        this.length = s.length;
        this.chars = s.dup;
    }
    this (char ch) {
        this.length = 1;
        this.chars = [ch];
    }
    // output
    string toString () {
        return format(`"%s"`, this.chars);
    }
    // range
    string opIndex (uint index) {
        return String(this.chars[index]);
    }
    @property void popFront () {
        ++ this.index;
    }
    @property bool empty () {
        return (this.index >= this.length);
    }
    @property String front () {
        return String(this.chars[this.index]);
    }
    // ...operational methods...
}

=== solution using opApply =========================================
struct Text {
    uint length;
    private char[] chars;
    private uint index;     // for input range
    // construction
    this (string s=null) {
        this.length = s.length;
        this.chars = s.dup;
    }
    this (char ch) {
        this.length = 1;
        this.chars = [ch];
    }
    // output
    string toString () {
        return format(`"%s"`, this.chars);
    }
    // indexing
    string opIndex (uint index) {
        return String(this.chars[index]);
    }
    // traversal (both plain & indexed uses)
    int opApply (int delegate (ref Text) block) {
        // not considering result
        foreach (_,ch ; this.chars)
            block(Text(ch));
        return 0;
    }
    int opApply (int delegate (ref uint, ref Text) block) {
        // not considering result
        foreach (i,ch ; this.chars)
            block(i, Text(ch));
        return 0;
    }
    // ...operational methods...
}
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com