std.d.lexer requirements
Jonathan M Davis
jmdavisProg at gmx.com
Wed Aug 1 22:44:14 PDT 2012
On Wednesday, August 01, 2012 22:33:12 Walter Bright wrote:
> The lexer must use char or it will not be acceptable as anything but a toy
> for performance reasons.
Avoiding decoding can be done with strings and operating on ranges of dchar,
so you'd be operating almost entirely on ASCII. Are you saying that there's a
performance issue aside from decoding?
> Somebody has to convert the input files into dchars, and then back into
> chars. That blows for performance. Think billions and billions of
> characters going through, not just a few random strings.
Why is there any converting to dchar going on here? I don't see why any would
be necessary. If you reading in a file as a string or char[] (as would be
typical), then you're operating on a string, and then the only time that any
decoding will be necessary is when you actually need to operate on a unicode
character, which is very rare in D's grammar. It's only when operating on
something _other_ than a string that you'd have to actually deal with dchars.
> > Hmmm. Well, I'd still argue that that's a parser thing. Pretty much
> > nothing
> > else will care about it. At most, it should be an optional feature of the
> > lexer. But it certainly could be added that way.
>
> I hate to say "trust me on this", but if you don't, have a look at dmd's
> lexer and how it handles identifiers, then look at dmd's symbol table.
My point is that it's the sort of thing that _only_ a parser would care about.
So, unless it _needs_ to be in the lexer for some reason, it shouldn't be.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list