[Issue 1466] Spec claims maximal munch technique always works: not for "1..3"

Mon Sep 3 04:08:59 PDT 2007

http://d.puremagic.com/issues/show_bug.cgi?id=1466

jascha at mainia.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jascha at mainia.de

------- Comment #5 from jascha at mainia.de  2007-09-03 06:08 -------
(In reply to comment #0)
> A snippet from http://digitalmars.com/d/1.0/lex.html:
> 
> "The source text is split into tokens using the maximal munch technique, i.e.,
> the lexical analyzer tries to make the longest token it can."
> 
> Relevant parts of the grammar:
> 
> Token:
>         FloatLiteral
>         ..
> 
> FloatLiteral:
>         Float
> 
> Float:
>         DecimalFloat
> 
> DecimalFloat:
>         DecimalDigits .
>         . Decimal
> 
> DecimalDigits:
>         DecimalDigit
> 
> DecimalDigit:
>         NonZeroDigit
> 
> Decimal:
>         NonZeroDigit
> 
> Based on the above, if a lexer encounters "1..3", for instance in a slice:
> "foo[1..3]", it should, using the maximal munch technique, make the longest
> possible token from "1..3": this is the Float "1.". Next, it should come up
> with the Float ".3".
> 
> Of course, this isn't currently happening, and would be problematic if it did.
> But, according to the grammar, that's what should happen, unless I'm missing
> something.
> 
> Either some exception needs to be made or remove the "DecimalDigits ."
> possibility from the grammar and the compiler.
> 

(In reply to comment #1)
> Reply to d-bugmail at puremagic.com,
> 
> > http://d.puremagic.com/issues/show_bug.cgi?id=1466
> > 
> > Summary: Spec claims maximal munch technique always works:
> > not
> > for "1..3"
> > Product: D
> > Version: 1.020
> > Platform: All
> > URL: http://digitalmars.com/d/1.0/lex.html
> > OS/Version: All
> > Status: NEW
> > Keywords: spec
> > Severity: minor
> > Priority: P3
> > Component: www.digitalmars.com
> > AssignedTo: bugzilla at digitalmars.com
> > ReportedBy: deewiant at gmail.com
> > A snippet from http://digitalmars.com/d/1.0/lex.html:
> > 
> > "The source text is split into tokens using the maximal munch
> > technique, i.e., the lexical analyzer tries to make the longest token
> > it can."
> > 
> > Relevant parts of the grammar:
> > 
> > Token:
> > FloatLiteral
> > ..
> > FloatLiteral:
> > Float
> > Float:
> > DecimalFloat
> > DecimalFloat:
> > DecimalDigits .
> > . Decimal
> > DecimalDigits:
> > DecimalDigit
> > DecimalDigit:
> > NonZeroDigit
> > Decimal:
> > NonZeroDigit
> > Based on the above, if a lexer encounters "1..3", for instance in a
> > slice: "foo[1..3]", it should, using the maximal munch technique, make
> > the longest possible token from "1..3": this is the Float "1.". Next,
> > it should come up with the Float ".3".
> > 
> > Of course, this isn't currently happening, and would be problematic if
> > it did. But, according to the grammar, that's what should happen,
> > unless I'm missing something.
> > 
> > Either some exception needs to be made or remove the "DecimalDigits ."
> > possibility from the grammar and the compiler.
> > 
> 
> or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
> 

it is possible to parse D using a maximal munch lexer - see the seatd grammar
for an example. it's a matter of what lexemes exactly you choose. in this
particular case, the float lexemes need to be split, such that those floats
with a trailing dot are not matched by a single lexeme. 

--