Writing a JFlex lexer for D - have an issue with cycles

Basile B. via Digitalmars-d digitalmars-d at puremagic.com
Sun Jan 22 15:20:27 PST 2017


On Sunday, 22 January 2017 at 22:11:08 UTC, FatalCatharsis wrote:
> I'm writing a flex lexer for D and I've hit a roadblock. It is 
> almost working EXCEPT for one specific production.
>
> StringLiteral is cyclic and I don't know how to approach it. It 
> is cyclic because:
>
>      Token -> StringLiteral -> TokenString -> Token
>
> To break the cycle, I was thinking I could just make a 
> production which is Token sans StringLiteral and instead subbed 
> with a production for StringLiteral that does not contain 
> TokenString, but that fundamentally changes the language. 
> Should the lexer really handle something like:
>
>     q{blah1q{20q{"meh"q{20.1q{blah}}}}}
>
> Lexically I don't know how this makes sense. To be clear, I'm 
> wondering if this is acceptable:
>
>     Token:
>         Identifier
>         StringLiteral
>         CharacterLiteral
>         IntegerLiteral
>         FloatLiteral
>         Keyword
>         Operator
>
>      StringLiteral:
>         WysiwygString
>         AlternateWysiwygString
>         DoubleQuotedString
>         HexString
>         DelimitedString
>         TokenString
>
>      TokenString:
>         q{ TokenNonNestedTokenStrings }
>
>
>      TokenNonNestedTokenStrings:
>         TokenNonNestedTokenString
>         TokenNonNestedTokenString TokenNonNestedTokenStrings
>
>      TokenNonNestedTokenString:
>         Identifier
>         StringLiteralNonNestedTokenString
>         CharacterLiteral
>         IntegerLiteral
>         FloatLiteral
>         Keyword
>         Operator
>
>      StringLiteralNonNestedTokenString:
>         WysiwygString
>         AlternateWysiwygString
>         DoubleQuotedString
>         HexString
>         DelimitedString
>
> Which basically disables nested token strings. Has anyone else 
> run into this issue?

One way to do this is not to do anything special for q{. Just add 
a token for q{ and continue normal lexing. The token string 
content must be valid tokens so it should work.

In facts it depends on what the scanner just be used for...for 
serious compiler things this is not acceptable. For highlighting 
or tools this might be okay. The real problem is that a token 
string should be printable... so it exists as a whole before 
being tokenized.


More information about the Digitalmars-d mailing list