range practicle use

Thu Dec 30 10:58:37 PST 2010

On Thu, 30 Dec 2010 11:19:33 -0600
Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org> wrote:

> On 12/30/10 6:41 AM, spir wrote:
> > Hello,
> >
> >
> > In the course of a project (1) 2 partner D programmers and myself are
> > currently implementing, we faced 2 issues which prevented us using a
> > range interface as planned. We initially intended to do it for better
> > compliance with D's coming new style, and nice inter-relation with
> > Phobos modules like std.algorithm. Instead, we use opApply to
> > implement traversal; which does the job. This post's purpose is to
> > expose these problems to help and making range interfaces better
> > usable, or simply usable, in various practical cases. (Opinions are
> > mine --I don't known my partners' exact position, except that they
> > indeed agree these topics plainly prevent us using ranges.)
> [snip]
> 
> Thanks very much for taking the time to document your experience. This 
> is very useful.
> 
> > -1- textual output
> > Functions like writeln silently use the formatValue set of templates
> in std.format, to get a textual form for the element to be output.
> Template constraints determine which version is used for a given kind of
> element. As implemented now:
> > * There are formatValue variants for elements that implement a range
> interface (which produce an array-like form).
> > * Constraint selection for a range interface has high precedence;
> especially it shortcuts any toString function explicitely implemented by
> the programmer.
> >
> > This prevents a user to define any adequate output format for the
> elements of a custom type, if the type also happens to implement a range
> interface. A critical bug, and an absolute blocker for us.
> 
> Seeing "absolute blocker" here makes me think: is there /absolutely no 
> way/ to get that particular task done in the D programming language? I 
> might be off here because I don't have details, but my knee-jerk 
> reaction is:
> 
> struct MyRange { ... }
> void print(MyRange r) { ... }

Absolute blocker is too strong ;-) But it is still, say, a _real_ blocker; because the project in question is a library (see below). Thus, we consider not only our needs (for which a custom print indeed does the job), but also client programmers' needs, practices, expectations.
From this point of view, we cannot (or rather I do not want to) force users of the lib writing such workarounds, but instead provide them with typical D facilities. For the same reason we initially planned to use ranges ;-)

> [...] 

> > Actually, this cannot even work in numerous cases. In our project, the range's ElementType (as returned by opindex&  front) would happen to be identical to the range type itself. In this case, the formatting algorithm via a range runs into an infinite loop.
> 
> Ha, glad you mention that. When I first designed ranges I had this fear 
> in a couple of places: if a range is its own element type, then there's 
> a problem. I deferred my solution about that to later. Apparently that 
> later has come now.
> 
> > It is instead necessary to use the programmer-defined format in toString (or later writeTo).
> > In the example below, trying to write() an element yields an infinite series of "[[[...", in trying to print it like an array.
> 
> Yep, yep, great feedback. This is a fixed point in the iteration 
> typeof(r) -> typeof(r.front), which is not difficult to detect.

Exactly.

> One 
> question would be - what to do when we _do_ detect that?
> 
> (One thing that should be kept in mind is that ranges are printed the 
> way they are simply because I had to choose _something_. There is 
> definitely a lot of room for improvement there in many aspects.)

1. Why do you think we need a _default_ output format for ranges? (We can always use toString or writeTo) Again, the case of ranges is completely different from the one of arrays. Ranges are (1) one aspect of given types (2) very diverse, even structurally (3) able to define toString. I personly would never need such a default format.

2. If ever such a default format for ranges is still considered a good thing, the template selection constraints must exclude cases where (1) toString is defined (2) the range's ElementType is identical to the type defining the range.
this gets a bit more complicated with structs' .stringof. And there's also a distinct issue with classes -- see below.
(I implemented the change when I discovered the issue, but could not succeed in linking for seemingly unrelated reasons.)

> > Note that it is a common situation: for instance, (textual) string types like ones in high-level languages do not expose a separate element type (character); instead, a character is just a singleton of the type. Even more commonly, list, tree, graph nodes usually are of the same type as the container (or the root/container is of a sub-type implementing global methods).
> 
> This is interesting, but I find it difficult to imagine concrete cases 
> because in all my code the iteration strategy is encoded in the range's 
> type. For example, splitter() returns a range that iterates a string by 
> words. Arguably the input and output have the same type, but the type of 
> the range is _not_ string, it's Splitter!string or whatever. Similarly, 
> for a self-referential data structure, the range type would be e.g. 
> InOrder!TreeNode, PostOrder!TreeNode etc. I guess there are legitimate 
> cases in which the TreeNode has front() that returns itself and so on, 
> so we need to address this.

Hum, maybe I have a somewhat different vision of ranges as the one you have in mind as typical case. For me, a range interface is mainly a utility providing iteration for a type -- whatever the type is and does. It is an _aspect_ of said type. just like opApply, simply more diverse, general, and useful (notably for algorithmic needs).
You seem to consider ranges as types by themselves. Then, yes, it makes more sense to have a default format, for instance.
On a tree structure i would like to define a range interface so that people could, for instance, print its nodes in sequence, filter them, or use a custom sort.

> > Another problem is that if the type is a class (as opposed to a struct), then defining a range interface introduces a template-selection conflict with the standard class case: as of now, they are not mutually exclusive, leading to compile-time error.
> 
> I don't understand this, could you please give a bit of detail?

I guess (not really tested) any class defining a range interface runs into the compiler error. A nearly not artificial example:

struct Pixel {int x,y;}
class Image {
    Pixel[] pixels;
    private uint index = 0;
    this (Pixel[] pixels) {this.pixels = pixels;}
    // range interface
    @property void popFront () {++ this.index;}
    @property bool empty () {return (this.index >= this.pixels.length);}
    @property Pixel front () {return this.pixels[this.index];}
    // ... operational methods ...
}
unittest {
    auto image = new Image([Pixel(1,2),Pixel(3,4),Pixel(5,6),]);
    writeln(image);
}

==>

rdmd -w -debug -unittest -L--export-dynamic --build-only -of"__trials__" "__trials__.d"

/usr/include/d/dmd/phobos/std/format.d(1404): Error: template std.format.formatValue(Writer,T,Char) if (is(const(T) == const(void[]))) formatValue(Writer,T,Char) if (is(const(T) == const(void[]))) matches more than one template declaration, /usr/include/d/dmd/phobos/std/format.d(1109):formatValue(Writer,T,Char) if (isInputRange!(T) && !isSomeChar!(ElementType!(T))) and /usr/include/d/dmd/phobos/std/format.d(1260):formatValue(Writer,T,Char) if (is(T == class))

Compilation failed.

(I love this error message -- it's a single line)
Note that I did not even define toString on the class.

> [...]

> > -2- indexed iteration
> >
> > It seems there is no way to directly define indexed iteration using ranges, like commonly needed by:
> >      foreach(index,element ; collection) {...}
> 
> Aha! The time of reckoning has come :o).
> 
> > A possible but inadequate workaround would be to define a tool type for this:
> >      struct TraversalPair (Element) {uint index ; Element element;}
> > Then define the range's element output routines (opIndex, front, back) to return pairs instead of elements; and use this like:
> >      foreach (pair ; collection) {actWith(pair.element, pair.index);}
> > But this requires a client programmer to know this particularity; and anyway does not fit D common style and practice, I guess.
> 
> Clearly that's an annoying limitation of ranges when compared against 
> opApply, and we need to fix that. That being said, I see your example 
> uses the index as a simple counter 0, 1, 2, ... so again putting my 
> "could this or could this not be done in the D programming language?" I 
> can't stop thinking about this:
> 
> size_t index = 0;
> foreach (e; collection) {
>      ...
>      ++index;
> }
> 
> You need to mind the presence of "continue" and the larger scope of 
> "index", but these are simple matters.
> 
> Annoying? Definitely. Makes ranges unusable? Perhaps not as much.

You are completely, and that's what we are used to do. But again user expectations when using a D lib have some weight on one side of the balance.
I think (maybe wrongly) this issue will have to be solved one day or the other when the community grows, people regurarly use ranges and... voice their demands. (Better think at it in advance to explore and _test_ various solutions in practice without pressure.)

> > How to solve this practically? I would be happy with the above workaround if it became a commonly used solution, supported by the stdlib and if necessary by the core language. This may scale by defining the tool type as a superclass instead, allowing various variants, possibly with more elements.
> > Maybe an alternative is to use tuples: allow variants of opIndex, front, and back, to return (index,element) tuples and let the compiler use these overloads when the client code requests 2 (or more) returned values.
> > The first solution is more explicite, and possibly general; the second may fit common practice better if/when tuples become widely used (a rather far&  hypothetical future ;-) ?).
> 
> I'd think tuples are the simplest solution and the basis of the language 
> change that fixes this insufficiency.

Right. (I'm personly not fan of tuples because they don't say what they hold -- as opposed to struct-like named tuple: t[0] vs p.index--, but it's not major issue: I won't discuss it ;-)

> First, if all you want is an index you can write this:
> 
> foreach (e; zip(iota(r.length), r))
> {
>      // e[0] is 0, 1, 2, ...
>      // e[1] is the current range element
> }

Yes, but if you write a lib and have an alternative solution providing indexed iteration to the lib's clients out of the bocx, what do you choose?
Another point is "index" in the wider sense of symbol/ref/entry id or key. just like for an associative array, it's not always an ordinal (and even not always sequential so that a widened notion of iota could not always do the job).

> [More brain production toward a solution.]

> Thanks again for this report. We definitely need to work on all of the 
> above. Let me ask you this - did the project actually need to use 
> algorithms in std.algorithm? Based on my experience, my assumption is 
> that you didn't need the more complex algorithms (e.g. sort or 
> bringToFront) at all, and that you didn't have intensive use for the 
> simpler algorithms (map, reduce) either. This is because if you did, 
> ranges would have had enough strategic advantage for you, and 
> consequently you would have been much more inclined to work around the 
> issues mentioned above. As such, it's possible that you started with 
> ranges because they sounded nice to have but didn't quite find a solid 
> need for their advantages over e.g. opApply. Please let me know - thanks.

Sorry, forgot to say 2 words about that in the OP.
The lib in question is a UCS/Unicode toolkit evoked 2-3 on the list. With 2 major aspects:
* A set of tools (normalisation, casing, sorting...) called "dunicode", issued from a previous by Stephan Mueller & Ivan Melchunik. (Who let it aside waiting for D2 stabilisation, but reactivated its unicode part after seeing my posts and contacting me.)
* A higher-level universal "Text" type, using some of those routines, and possibly later used by some other ones.

Both parts are in good advancement stage (actually usable), we were waiting for merging code and structure before publishing information. (As of now, Text is for instance stand-alone, which is stupid not only for repetition, also because their algorithms and data structures are or will be better than mine.)
For curious people:
dunicode: https://bitbucket.org/stephan/dunicode/src
Text: https://bitbucket.org/denispir/denispir-d/src
(A presentation for Text is at https://bitbucket.org/denispir/denispir-d/src/b543fb352803/U%20missing%20level%20of%20abstraction. I'm not yet happy with it -- feedback welcome.)

Now, about your question:
We 3 do not strictly need much; except if we want to provide default string sorting, for instance. we can indeed work around issues for us, for our own needs, not for the users' needs.
But a range interface and its side effect on integration with the rest of the 'new' D practice and facilities (esp algorithmic) is, i guess, a great gain for clients of a lib. Hope I'm clear. I even think _this_ precisely is the point of having generalisations like ranges, higher-order funcs, or generics, built in the core of a language's toolkit. But that's only me.

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com