RFC on range design for D2

Benji Smith dlanguage at benjismith.net
Thu Sep 11 02:42:29 PDT 2008


Bill Baxter wrote:
> On Thu, Sep 11, 2008 at 1:31 PM, Benji Smith <dlanguage at benjismith.net> wrote:
>> Parsers & regex engines move both forward and backward, as they try to match
>> characters to a pattern.
>>
>> Really, anything that uses an NFA or DFA to define patterns would benefit
>> from fast bidirectional iteration...
> 
> Good call.  I was about to post something mentioning that Turing
> machines but that seemed too academic.  Same class of thing as
> NFA/DFA/FSM.
> 
> The question is, though, would you really implement those things using
> a linked list?  I would expect most of those suckers work on arrays,
> and so can take advantage of the bidirectional nature of random access
> ranges.

Actually, Perl 6 will (assuming they ever finish it) finally allow regex 
matching against input streams:

http://dev.perl.org/perl6/doc/design/syn/S05.html

It's a big document. Search for the text "Matching against non-strings"

This kind of thing was one of the main arguments I made in my "Why 
Strings as Classes" thread, that got everyone's panties in a bunch and 
that no one else agreed with.

In that thread, I argued that Strings should be objects so that they can 
implement a CharSequence interface (or something like it). And then all 
the standard text-processing stuff could be written against that 
interface, allowing regex engines and parsers to be agnostic about 
whether they read from an actual in-memory string or from a 
(file|database|socket) input stream.

With the range proposal on the table, I'd be just as happy if all the D 
text-processing stuff in the standard libs was implemented against a 
Range!(T), where T is one of the char types. Especially if ranges can be 
infinite.

Bill Baxter wrote:
> Hmm, for FSMs you can't really define a good end state.  There may not
> be any particular end state. ... ah, but wait I forgot.  That's the
> beauty of a range -- the end "state" doesn't have to be a "state" per
> se.  It can be any predicate you want it to be.  "Range" is misleading
> in this case.  This is one of those cases where you just have to
> remember "range" means "current value plus stopping criterion".

That's what I was saying earlier.

I think the mechanics are good. And for contiguous, sequential 
containers, the word "range" is great. For other types of containers, or 
other selection/iteration scenarios, you can shoehorn your mental model 
into the "range" metaphor. But it's weird.

--benji


More information about the Digitalmars-d-announce mailing list