More lexer questions

Sun Feb 12 07:34:52 PST 2012

On Sat, 11 Feb 2012 19:42:21 +0100, H. S. Teoh <hsteoh at quickfur.ath.cx>  
wrote:

> According to the online specs, the lexer tries to tokenize by maximal
> matching (except for one exception in the case of ranges like "1..2").
> The fact that this exception is stated seems to indicate that it's
> permitted to have two literals side-by-side without an intervening
> space.
>
> So does that mean "1e2" should be tokenized as (float lit: 1e2) and
> "1f2" should be tokenized as (int lit: 1)(identifier: f2)?
It's "1f" float and "2" int.
auto a = 1f;
pragma(msg, typeof(a));
>
> Or, for that matter, "123abcdefg" should be tokenized as (int lit:
> 123)(identifier: abcdefg) whereas "0x123abcdefg" should be tokenized as
> (int lit: 0x123abcdef)(identifier: g)?
>
> Or worse, if we still allow octals, "0129" should be tokenized as (octal
> lit: 012)(int lit: 9)?
Octals are deprecated.
>
> Or do we expect that any integer/float literal will always span the
> longest string that has characters permitted in any numerical literal,
> and then after the fact the lexer will give an error if the string
> cannot be interpreted as a legal literal? IOW, "0129" will first be
> scanned in its entirety as a numerical literal, then afterwards the
> lexer decides that '9' doesn't belong in an octal so it throws an error
> (as opposed to maximally matching "012" as an octal literal followed by
> a decimal literal "9").  Or, for that matter, "0123xel.u123" will be
> scanned as a numerical literal (since all the characters in it occur in
> some kind of numerical literal), and then an error generated after the
> fact when the lexer realizes that this string isn't a legal numerical
> literal?
>
>
> T