std.d.lexer requirements
Jonathan M Davis
jmdavisProg at gmx.com
Thu Aug 2 16:52:35 PDT 2012
On Thursday, August 02, 2012 19:30:47 Andrei Alexandrescu wrote:
> On 8/2/12 7:18 PM, Jonathan M Davis wrote:
> Your insights are always appreciated; even their Cliff notes :o).
LOL. Well, I'm not about to decide on the best approach to this without
thinking through it more. What I've been doing manages to deal quite nicely
with avoiding unnecessary decoding and still allows for the lexing of ranges
of dchar which aren't strings (though there's obviously an efficiency hit
there), and it really isn't complicated or messy thanks to some basic mixins
that I've been using. Switching to operating specifically on code units and not
accepting ranges of dchar at all has some serious ramifications, and I have to
think through them all before I take a position on that.
> > but Walter seems to be arguing that that strings
> > should be treated as ranges of code units in general, which I think is
> > completely wrong.
>
> I think Walter has very often emphasized the need for the lexer to be
> faster than the usual client software. My perception is that he's
> discussing lexer design in understanding there's a need for a less
> comfortable approach, namely do decoding in client.
That may be, but if he's arguing that strings should _always_ be treated as
range of code units - as in all D programs, most of which don't have anything
to do with lexers (other than when they're compiled) - then I'm definitely
going to object to that, and it's my understanding that that's what he's
arguing. But maybe I've misunderstood.
I've been arguing that strings should still be treated as ranges of code
points and that that does not preclude making the lexer efficiently operate on
code units when operating on strings even if it operates on ranges of dchar. I
think that whether making the lexer operate on ranges of dchar but specialize
on strings is a better approach or making it operate specifically on ranges of
code units is a better approach depends on what we want it to be usable with.
It should be just as fast with strings in either case, so it becomes a
question of how we want to handle ranges which _aren't_ strings.
I suppose that we could make it operate on code units and just let ranges of
dchar have UTF-32 as their code unit (since dchar is both a code unit and a
code point), then ranges of dchar will still work but ranges of char and wchar
will _also_ work. Hmmm. As I said, I'll have to think this through a bit.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list