Interpreting the D grammar

MakersF via Digitalmars-d digitalmars-d at puremagic.com
Thu Aug 6 01:26:56 PDT 2015


On Sunday, 2 August 2015 at 18:22:01 UTC, Jacob Carlborg wrote:
> On 02/08/15 19:15, Xinok wrote:
>
>> I guess you're not familiar with the theoretical aspect of 
>> "formal
>> languages". The D grammar is a context-free grammar which 
>> cannot be
>> reduced to a regular expression. As cym13 stated, there are 
>> some simple
>> context-free grammars which can be rewritten as regular 
>> expressions, but
>> the D grammar cannot be. Take a look at the Chomsky Hierarchy 
>> [1] for a
>> better understanding.
>>
>> The classic example of a context-free language is the set of 
>> balanced
>> parenthesis, i.e. (()) is balanced and ())))) is not. This 
>> language is
>> not regular meaning you cannot write a regular expression for 
>> it, but
>> you can write a context-free grammar for it.
>
> TextMate grammars are not _just_ regular expressions. They can 
> define balanced parentheses [1].
>
> The point of a language grammar in a text editor is not to have 
> a 100% correct implementation of the grammar. Rather it should 
> syntax highlight the code in a way that is useful for the user.
>
> [1] https://manual.macromates.com/en/language_grammars

Then your best shot is to approximate the grammar with the regual 
expressions you have access to. You'll get to a point where some 
constructs can not be correctly represented; at that point you 
should probably write a regex which produces what the grammar 
produces and some more.

In the example before of generating paired interleaved 
parentheses, you could generate every possible combination of 
parentheses, like
( (|)|[|]|{|}|" )*
where only the external parentheses are syntax for the regex. 
That regex matches all the productions of the paired parentheses 
grammar, and many more strings.

At the end of the day you want to highlight correct syntax, and 
if an user writes wrong syntax is OK to have wrong highlight, so 
be sure your regex work for the right syntax, and can do random 
stuff for the wrong one


More information about the Digitalmars-d mailing list