Request for comments: std.d.lexer

Tue Feb 5 20:27:17 PST 2013

On Tuesday, February 05, 2013 22:51:32 Andrei Alexandrescu wrote:
> I think it would be reasonable for a lexer to require a range of ubyte
> as input, and carry its own decoding. In the first approximation it may
> even require a random-access range of ubyte.

I'd have to think about how you'd handle the Unicode stuff in that case, since 
I'm not quite sure what you mean by having it handle its own decoding if it's 
a range of code units, but what I've been working on works with all of the 
character types and is very careful about how it deals with decoding in order 
to avoid unnecessary decoding. And that wasn't all that hard as far as the 
lexer's code goes. The hard part with that was making std.utf work with ranges 
of code units rather than just strings, and that was committed months ago.

- Jonathan M Davis