std.d.lexer requirements

Tue Aug 7 14:14:01 PDT 2012

On Tuesday, August 07, 2012 12:38:26 Walter Bright wrote:
> Yes, I understand that. There's also a point about adding too much
> complexity to the interface. The delegate callback reduces complexity in
> the interface.

It doesn't really affect much to allow choosing between returning a token and 
using a delegate, especially if ignoring errors is treated as a separate 
option rather than simply using a delegate that skips them (which may or may 
not be beneficial - it's faster without the delegate, but it's actually kind of 
hard to get lexing errors).

What worries me more is stuff like providing a way to have the range calculate 
the current position itself (as Christophe suggested IIRC) or having it 
provide an efficient way to determine the number of code units between two 
ranges so that you can slice the range lexed to put in the Token. Determining 
the number of code units is easily done with ptr for strings, but for 
everything else, you generally have to count as code units are consumed, which 
isn't really an issue for small tokens (especially those like symbols where 
the length is known without counting) but does add up for arbitrarily long 
ones such as comments or string literals. So, providing a way to calculate it 
more efficiently where possible might be desirable, but it's yet another layer 
of complication, and I don't know that it's actually possible to provide such 
a function in enough situations for it to be worth providing that 
functionality.

I expect that the configuration stuff is going to have to be adjusted after I'm 
done, since I'm not sure that it's entirely clear what's worth configuring or 
not.

- Jonathan M Davis