std.d.lexer: pre-voting review / discussion

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Sep 12 09:10:18 PDT 2013


12-Sep-2013 19:39, Timon Gehr пишет:
> On 09/11/2013 08:49 PM, Walter Bright wrote:
>> 4. When naming tokens like .. 'slice', it is giving it a
>> syntactic/semantic name rather than a token name. This would be awkward
>> if .. took on new meanings in D. Calling it 'dotdot' would be clearer.
>> Ditto for the rest. For example that is done better, '*' is called
>> 'star', rather than 'dereference'.
>
> FWIW, I use Tok!"..". I.e. a "UDL" for specifying kinds of tokens when
> interfacing with the parser. Some other kinds of tokens get a canonical
> representation. Eg. Tok!"i" is the kind of identifier tokens, Tok!"0" is
> the kind of signed integer literal tokens etc.

I like this.
Not only this has the benefit of not colliding with keywords. I also 
imagine that it could be incredibly convenient to get back the symbolic 
representation of a token (when token used as parameter to AST-node say 
BinaryExpr!(Tok!"+")). And truth be told we all know how tokens look in 
symbolic form so learning a pack of names for them feels pointless.

>> 6. No clue how lookahead works with this.
>
> Eg. use a CircularBuffer adapter range. I have an implementation
> currently coupled with my own lexer implementation. If there is
> interest, I could factor it out.
>
> Lookahead is realized as follows in the parser:
>
> (assume 'code' is the circular buffer range.)
>
> auto saveState(){muteerr++; return code.pushAnchor();} // saves the
> state and mutes all error messages until the state is restored
>
> void restoreState(Anchor state){ muteerr--; code.popAnchor(state); }
>
> The 'Anchor' is a trivial wrapper around a size_t. The circular buffer
> grows automatically to keep around tokens still reachable by an anchor.
> (The range only needs small constant space besides the buffer to support
> this functionality, though it is unable to detect usage errors.)
>
>
> This approach is typically more efficient than using a free list on
> contemporary architectures.
>

This ^^ is how. In fact std.d.lexer internally does similar thing with 
non-RA ranges of bytes.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list