std.d.lexer requirements

Wed Aug 1 22:44:14 PDT 2012

On Wednesday, August 01, 2012 22:33:12 Walter Bright wrote:
> The lexer must use char or it will not be acceptable as anything but a toy
> for performance reasons.

Avoiding decoding can be done with strings and operating on ranges of dchar, 
so you'd be operating almost entirely on ASCII. Are you saying that there's a 
performance issue aside from decoding?

> Somebody has to convert the input files into dchars, and then back into
> chars. That blows for performance. Think billions and billions of
> characters going through, not just a few random strings.

Why is there any converting to dchar going on here? I don't see why any would 
be necessary. If you reading in a file as a string or char[] (as would be 
typical), then you're operating on a string, and then the only time that any 
decoding will be necessary is when you actually need to operate on a unicode 
character, which is very rare in D's grammar. It's only when operating on 
something _other_ than a string that you'd have to actually deal with dchars.

> > Hmmm. Well, I'd still argue that that's a parser thing. Pretty much
> > nothing
> > else will care about it. At most, it should be an optional feature of the
> > lexer. But it certainly could be added that way.
> 
> I hate to say "trust me on this", but if you don't, have a look at dmd's
> lexer and how it handles identifiers, then look at dmd's symbol table.

My point is that it's the sort of thing that _only_ a parser would care about. 
So, unless it _needs_ to be in the lexer for some reason, it shouldn't be.

- Jonathan M Davis