RegEx for a simple Lexer

anonymous via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Tue May 13 13:43:50 PDT 2014


On Tuesday, 13 May 2014 at 19:53:17 UTC, Tim Holzschuh via
Digitalmars-d-learn wrote:
> If I also want to create a RegEx to filter string-expressions a 
> la " xyz ", how would I do this?
>
> At least match( src, r"^\" (.*) $\" " ); doesn't seem to work 
> and I couldn't find in the Library Reference how to change it..

That string literal is malformed. WYSIWYG strings (r"...") don't
know escape sequences. So, the string ends at the second quote,
and the rest is syntactical garbage to the compiler.
    "^\" (.*) $\" "
would be a proper D string literal. You could also use the
alternative WYSIWYG syntax:
    `^" (.*) $" `

That dollar sign looks off, though. It matches the end of the
input. You probably want to put that at the end of the regex:
    "^\" (.*) \"$"
Meaning: The match has to start at the beginning of the input
(^). Matches a quote, then a space, then anything (.*), then a
space, then a quote. The match has to end at the end of the input
($).

Then again, when you're writing a tokenizer/parser, you usually
don't require an expression to span the whole input, but just
match as far as it goes. In that case, drop the dollar sign. And
think about what happens when there are quotes in the payload.


More information about the Digitalmars-d-learn mailing list