More lexer questions

Sat Feb 11 10:42:21 PST 2012

According to the online specs, the lexer tries to tokenize by maximal
matching (except for one exception in the case of ranges like "1..2").
The fact that this exception is stated seems to indicate that it's
permitted to have two literals side-by-side without an intervening
space.

So does that mean "1e2" should be tokenized as (float lit: 1e2) and
"1f2" should be tokenized as (int lit: 1)(identifier: f2)?

Or, for that matter, "123abcdefg" should be tokenized as (int lit:
123)(identifier: abcdefg) whereas "0x123abcdefg" should be tokenized as
(int lit: 0x123abcdef)(identifier: g)?

Or worse, if we still allow octals, "0129" should be tokenized as (octal
lit: 012)(int lit: 9)?

Or do we expect that any integer/float literal will always span the
longest string that has characters permitted in any numerical literal,
and then after the fact the lexer will give an error if the string
cannot be interpreted as a legal literal? IOW, "0129" will first be
scanned in its entirety as a numerical literal, then afterwards the
lexer decides that '9' doesn't belong in an octal so it throws an error
(as opposed to maximally matching "012" as an octal literal followed by
a decimal literal "9").  Or, for that matter, "0123xel.u123" will be
scanned as a numerical literal (since all the characters in it occur in
some kind of numerical literal), and then an error generated after the
fact when the lexer realizes that this string isn't a legal numerical
literal?

T

-- 
All men are mortal. Socrates is mortal. Therefore all men are Socrates.