Template Metaprogramming Made Easy (Huh?)

Fri Sep 11 19:00:35 PDT 2009

Rainer Deyke wrote:
> I'm not entirely happy with the way Scala handles the division between
> statements - Scala's rules seem arbitrary and complex - but semicolons
> *are* noise, no matter how habitually I use them and how much time I
> waste removing them afterwards.

I don't know anything about scala, but I've been working on an 
Actionscript compiler recently (the language is based on ECMAScript, so 
it's very much like JavaScript in this respect) and the optional 
semicolon rules are completely maddening.

The ECMAScript spec basically says: virtual semicolons must be inserted 
at end-of-line whenever the non-insertion of semicolons would result in 
an erroneous parse.

So there are really only three ways to handle it, and all of them are 
insane:

1) Treat the newline character as a token (rather than as skippable 
whitespace) and include that token as an optional construct in every 
single production where it can legally occur. This results in hundreds 
of optional semicolons throughout the grammar, and makes the whole thing 
a nightmare to read, but at least it still uses a one-pass CFG.

     CLASS :=
       "class"
       NEWLINE?
       IDENTIFIER
       NEWLINE?
       "{"
       NEWLINE?
       (
         MEMBER
         NEWLINE?
       )*
       "}"

2) Use lexical lookahead, dispatched from the parser. The tokenizer 
determines whether to treat a newline as a statement terminator based on 
the current parse state (are we in the middle of a parenthetized 
expression?) and the upcoming tokens on the next line. This is nasty 
because the grammar becomes context-sensitive and conflates lexical 
analysis with parsing.

2) Whenever the parser encounters an error, have it back up to the 
beginning of the previous production and insert a virtual semicolon into 
the token stream. Then try reparsing. Since there might be multiple 
newlines contained in a single multiline expression, it might take 
arbitrarily many rewrite attempts before reaching a correct parse.

The thing about most compiler construction tools is that they don't 
allow interaction between the context-guided tokenization, and they're 
not designed for the creation of backup-and-retry processing, or the 
insertion of virtual tokens into the token stream.

Ugly stuff.

Anyhoo, I know this is waaaaaaay off topic. But I think any language 
designer including optional semicolons in their language desperately 
deserves a good swift punch in the teeth.

--benji