Interpreting the D grammar

Sun Aug 2 10:15:39 PDT 2015

On Sunday, 2 August 2015 at 14:50:35 UTC, Jacob Carlborg wrote:
> I'm trying to read the D grammar [1] to enhance the D TextMate 
> bundle. If we take the add expression as an example. It's 
> defined like this in the grammar:
>
> AddExpression:
>     MulExpression
>     AddExpression + MulExpression
>     AddExpression - MulExpression
>     CatExpression
>
> And like this in the grammar made by Brian [2]:
>
> addExpression:
>       mulExpression
>     | addExpression ('+' | '-' | '~') mulExpression
>     ;
>
> I'm not so familiar with grammars but this looks like it's 
> recursive. Is it possible to translate this piece of grammar to 
> a regular expression? TextMate uses regular expressions and a 
> couple of enhancements/extensions to define a grammar for a 
> language.
>
> [1] http://dlang.org/grammar.html
> [2] https://rawgit.com/Hackerpilot/DGrammar/master/grammar.html

I guess you're not familiar with the theoretical aspect of 
"formal languages". The D grammar is a context-free grammar which 
cannot be reduced to a regular expression. As cym13 stated, there 
are some simple context-free grammars which can be rewritten as 
regular expressions, but the D grammar cannot be. Take a look at 
the Chomsky Hierarchy [1] for a better understanding.

The classic example of a context-free language is the set of 
balanced parenthesis, i.e. (()) is balanced and ())))) is not. 
This language is not regular meaning you cannot write a regular 
expression for it, but you can write a context-free grammar for 
it.

[1] https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy