Request for comments: std.d.lexer

Brian Schott briancschott at gmail.com
Sun Jan 27 02:42:17 PST 2013


On Sunday, 27 January 2013 at 10:17:48 UTC, Philippe Sigaud wrote:
> * Having a range interface is good. Any reason why you made 
> byToken a
> class and not a struct? Most (like, 99%) of range in Phobos are
> structs. Do you need reference semantics?

It implements the InputRange interface from std.range so that 
users have a choice of using template constraints or the OO model 
in their code.

> * Also, is there a way to keep comments? Any code wanting the 
> modify
> the code might need them.
> (edit: Ah, I see it: IterationStyle.IncludeComments)
>
> * I'd distinguish between standard comments and documentation
> comments. These are different beasts, to my eyes.

The standard at http://dlang.org/lex.html doesn't differentiate 
between them. It's trivial to write a function that checks if a 
token starts with "///", "/**", or "/++" while iterating over the 
tokens.

> * I see Token has a startIndex member. Any reason not to have a
> endIndex member? Or can and end index always be deduced from
> startIndex and value.length?

That's the idea.

> * How does it fare with non ASCII code?

Everything is templated on the character type, but I haven't done 
any testing on UTF-16 or UTF-32. Valgrind still shows functions 
from std.uni being called, so at the moment I assume it works.

> * A rough estimate of number of tokens/s would be good (I know 
> it'll
> vary). Walter seems to think if a lexer is not able to vomit 
> thousands
> of tokens a seconds, then it's not good. On a related note, 
> does your
> lexer have any problem with 10k+-lines files?

$ time dscanner --sloc ../phobos/std/datetime.d
14950

real	0m0.319s
user	0m0.313s
sys	0m0.006s

$ time dmd -c ../phobos/std/datetime.d

real	0m0.354s
user	0m0.318s
sys	0m0.036s

Yes, I know that "time" is a terrible benchmarking tool, but 
they're fairly close for whatever that's worth.



More information about the Digitalmars-d mailing list