std.d.lexer requirements
Walter Bright
newshound2 at digitalmars.com
Wed Aug 1 17:10:07 PDT 2012
Given the various proposals for a lexer module for Phobos, I thought I'd share
some characteristics it ought to have.
First of all, it should be suitable for, at a minimum:
1. compilers
2. syntax highlighting editors
3. source code formatters
4. html creation
To that end:
1. It should accept as input an input range of UTF8. I feel it is a mistake to
templatize it for UTF16 and UTF32. Anyone desiring to feed it UTF16 should use
an 'adapter' range to convert the input to UTF8. (This is what component
programming is all about.)
2. It should output an input range of tokens
3. tokens should be values, not classes
4. It should avoid memory allocation as much as possible
5. It should read or write any mutable global state outside of its "Lexer"
instance
6. A single "Lexer" instance should be able to serially accept input ranges,
sharing and updating one identifier table
7. It should accept a callback delegate for errors. That delegate should decide
whether to:
1. ignore the error (and "Lexer" will try to recover and continue)
2. print an error message (and "Lexer" will try to recover and continue)
3. throw an exception, "Lexer" is done with that input range
8. Lexer should be configurable as to whether it should collect information
about comments and ddoc comments or not
9. Comments and ddoc comments should be attached to the next following token,
they should not themselves be tokens
10. High speed matters a lot
11. Tokens should have begin/end line/column markers, though most of the time
this can be implicitly determined
12. It should come with unittests that, using -cov, show 100% coverage
Basically, I don't want anyone to be motivated to do a separate one after seeing
this one.
More information about the Digitalmars-d
mailing list