Template Metaprogramming Made Easy (Huh?)
Benji Smith
dlanguage at benjismith.net
Fri Sep 11 19:00:35 PDT 2009
Rainer Deyke wrote:
> I'm not entirely happy with the way Scala handles the division between
> statements - Scala's rules seem arbitrary and complex - but semicolons
> *are* noise, no matter how habitually I use them and how much time I
> waste removing them afterwards.
I don't know anything about scala, but I've been working on an
Actionscript compiler recently (the language is based on ECMAScript, so
it's very much like JavaScript in this respect) and the optional
semicolon rules are completely maddening.
The ECMAScript spec basically says: virtual semicolons must be inserted
at end-of-line whenever the non-insertion of semicolons would result in
an erroneous parse.
So there are really only three ways to handle it, and all of them are
insane:
1) Treat the newline character as a token (rather than as skippable
whitespace) and include that token as an optional construct in every
single production where it can legally occur. This results in hundreds
of optional semicolons throughout the grammar, and makes the whole thing
a nightmare to read, but at least it still uses a one-pass CFG.
CLASS :=
"class"
NEWLINE?
IDENTIFIER
NEWLINE?
"{"
NEWLINE?
(
MEMBER
NEWLINE?
)*
"}"
2) Use lexical lookahead, dispatched from the parser. The tokenizer
determines whether to treat a newline as a statement terminator based on
the current parse state (are we in the middle of a parenthetized
expression?) and the upcoming tokens on the next line. This is nasty
because the grammar becomes context-sensitive and conflates lexical
analysis with parsing.
2) Whenever the parser encounters an error, have it back up to the
beginning of the previous production and insert a virtual semicolon into
the token stream. Then try reparsing. Since there might be multiple
newlines contained in a single multiline expression, it might take
arbitrarily many rewrite attempts before reaching a correct parse.
The thing about most compiler construction tools is that they don't
allow interaction between the context-guided tokenization, and they're
not designed for the creation of backup-and-retry processing, or the
insertion of virtual tokens into the token stream.
Ugly stuff.
Anyhoo, I know this is waaaaaaay off topic. But I think any language
designer including optional semicolons in their language desperately
deserves a good swift punch in the teeth.
--benji
More information about the Digitalmars-d
mailing list