Context-Free Grammars? What about arrays?

Nick Sabalausky a at a.a
Mon Feb 14 14:40:50 PST 2011


"%u" <wfunction at hotmail.com> wrote in message 
news:ij3la9$t53$1 at digitalmars.com...
>
> The problem is that it _is_ ambiguous what rule to apply. To me, just 
> because
> static arrays and associative arrays happen to have similar _looks_ 
> doesn't make
> parsing them context-free -- they're defined with completely separate 
> syntaxes,
> and they just happen to look identical.

Yes, they just happen to look identical: but that's the crux: How it "looks" 
is the *only* thing that parser/grammar are concerned with. The fact that it 
could be one of two different *types* is irrelevent since type systems are 
purely a semantic matter and (at least in D) what type something is never 
affects how the code's *structure* is interpreted.

True, there are times when different types are distinguished via different 
syntaxes (for instance, function vs integer vs array), but that's 
incidental. The important thing is that that semantic-level type information 
never actually needs to feed back into the parser.

In the case of "int[U] s;", the *only* thing the parser is concerned with is 
"Ok, this is a variable declaration, and a variable declaration is a 
statement, and a statement can occur in x, y and z places". All of that 
stays exactly the same no matter what "U" is and no matter what type 
"int[U]" is. Anything beyond that is not the parser's job.

Consider this:

double a;
bool b;

Two different variables that are treated very differently. But the 
*structure* is exactly the same: {type} {identifier} {semicolon}. In all 
cases, that means "variable declaration". And a variable declaration is a 
statement, etc. For both variables, that structure is identical, and that's 
all that the parser is concerned with. In a grammar-definition that would 
look something like this:

<Var Decl> ::= <Type> <Ident> ';'
<Statement> ::= <Var Decl>
                       |  any other type of statement here
                       |  etc

Likewise, in the case of "int[U] s;" we have:

<Var Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'

Note that everything about that is true no matter what the ident inside the 
brackets is. Also note that there is *nothing* in there the pulls in any 
information from the semantic phase. If it did, it wouldn't be context-free.

If it worked like this:

<Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
<Assoc Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'

...*Then* it would no longer be context-free because the parser would have 
to rely on the semantics of 'U' to determine how to parse it. But D doesn't 
do that. It doesn't even bother distinguishing between array declarations 
and assoc array declarations until the semantic phase. Therefore the grammar 
(ie, the parsing) is still context-free.

One other thing that would prevent it from being context-free would be if 
the left-side of a production rule had more than one token:

<A> <B> ::= etc...


> The fact that the compiler has to do a
> semantic analysis in order to figure this out (which was originally a 
> syntax
> problem) really makes me wonder why you mention that D is still 
> context-free.
>

It's not a syntax problem unless the grammar defines it to be a syntax 
problem. Anything that's defined in the grammar is syntax; anything that 
isn't defined in the grammar is not syntax. Also, remember that grammar only 
apples to lexing/parsing and does not specify anything about semantics (and 
"sementics" effectively means "anything that isn't handled by the lexer or 
the parser").





More information about the Digitalmars-d-learn mailing list