Request for comments: std.d.lexer

Brian Schott briancschott at gmail.com
Sun Jan 27 02:55:48 PST 2013


On Sunday, 27 January 2013 at 10:32:39 UTC, deadalnix wrote:
> Very happy to see that !
>
> Some remarks :
>  - Many parameters should be compile time parameters. Instead 
> of runtime.

I decided not to do this because the lexer actually calls itself 
while parsing token strings. If they were compile-time 
parameters, the compiler would likely generate a lot more code.

>  - I'm not a big fan of byToken name, but let's see what others 
> think of it.

Chosen for consistency with the various functions in std.stdio, 
but now that you point his out, it's not very consistent with 
std.algorithm or std.range.

>  - I'm not sure this is the role of the lexer to process 
> __IDENTIFIER__ special stuffs.

According to http://dlang.org/lex#specialtokens this is the 
correct behavior.

>  - You need to provide a way to specify haw textual 
> representation of the token (ie value) is set. The best way to 
> do it IMO is an alias parameter that return a string when 
> called with a string (then the user can choose to keep the 
> value from original string, create a copy, always get the same 
> copy with the same string, etc . . .).

The lexer does not operate on slices of its input. It would be 
possible to special-case for this in the future.

>  - Ideally, the location format should be configurable.
>  - You must return at least a forward range, and not an input 
> range, otherwize a lexer cannot lookahead.

It's easy to wrap this range inside of another that does 
buffering for lookahead. 
https://github.com/Hackerpilot/Dscanner/blob/range-based-lexer/circularbuffer.d

> And the famous Job's « one last thing » : I'm not a big fan of 
> having OPERATORS_BEGIN of the same type as regular token types. 
> Now they make valid token. Why not provide a set of function 
> like isOperator ?

This eliminates possible uses of the case range statement. It may 
be the case (ha ha) that nobody cares and would rather have those 
functions. I'd be fine with changing that.


More information about the Digitalmars-d mailing list