std.d.lexer requirements
Brad Roberts
braddr at puremagic.com
Sun Aug 5 00:59:40 PDT 2012
To help with performance comparisons I ripped dmd's lexer out and got it building as a few .d files. It's very crude.
It's got tons of casts (more than the original c++ version). I attempted no cleanup or any other change than the
minimum I could to get it to build and run. Obviously there's tons of room for cleanup, but that's not the point...
it's just useful as a baseline.
The branch:
https://github.com/braddr/phobos/tree/dmd_lexer
The commit with the changes:
https://github.com/braddr/phobos/commit/040540ef3baa38997b15a56be3e9cd9c4bfa51ab
On my desktop (far from idle, it's running 2 of the auto testers), it consistently takes 0.187s to lex all of the .d
files in phobos.
Later,
Brad
On 8/1/2012 5:10 PM, Walter Bright wrote:
> Given the various proposals for a lexer module for Phobos, I thought I'd share some characteristics it ought to have.
>
> First of all, it should be suitable for, at a minimum:
>
> 1. compilers
>
> 2. syntax highlighting editors
>
> 3. source code formatters
>
> 4. html creation
>
> To that end:
>
> 1. It should accept as input an input range of UTF8. I feel it is a mistake to templatize it for UTF16 and UTF32. Anyone
> desiring to feed it UTF16 should use an 'adapter' range to convert the input to UTF8. (This is what component
> programming is all about.)
>
> 2. It should output an input range of tokens
>
> 3. tokens should be values, not classes
>
> 4. It should avoid memory allocation as much as possible
>
> 5. It should read or write any mutable global state outside of its "Lexer"
> instance
>
> 6. A single "Lexer" instance should be able to serially accept input ranges, sharing and updating one identifier table
>
> 7. It should accept a callback delegate for errors. That delegate should decide whether to:
> 1. ignore the error (and "Lexer" will try to recover and continue)
> 2. print an error message (and "Lexer" will try to recover and continue)
> 3. throw an exception, "Lexer" is done with that input range
>
> 8. Lexer should be configurable as to whether it should collect information about comments and ddoc comments or not
>
> 9. Comments and ddoc comments should be attached to the next following token, they should not themselves be tokens
>
> 10. High speed matters a lot
>
> 11. Tokens should have begin/end line/column markers, though most of the time this can be implicitly determined
>
> 12. It should come with unittests that, using -cov, show 100% coverage
>
>
> Basically, I don't want anyone to be motivated to do a separate one after seeing this one.
More information about the Digitalmars-d
mailing list