Context-Free Grammars? What about arrays?
Nick Sabalausky
a at a.a
Mon Feb 14 14:40:50 PST 2011
"%u" <wfunction at hotmail.com> wrote in message
news:ij3la9$t53$1 at digitalmars.com...
>
> The problem is that it _is_ ambiguous what rule to apply. To me, just
> because
> static arrays and associative arrays happen to have similar _looks_
> doesn't make
> parsing them context-free -- they're defined with completely separate
> syntaxes,
> and they just happen to look identical.
Yes, they just happen to look identical: but that's the crux: How it "looks"
is the *only* thing that parser/grammar are concerned with. The fact that it
could be one of two different *types* is irrelevent since type systems are
purely a semantic matter and (at least in D) what type something is never
affects how the code's *structure* is interpreted.
True, there are times when different types are distinguished via different
syntaxes (for instance, function vs integer vs array), but that's
incidental. The important thing is that that semantic-level type information
never actually needs to feed back into the parser.
In the case of "int[U] s;", the *only* thing the parser is concerned with is
"Ok, this is a variable declaration, and a variable declaration is a
statement, and a statement can occur in x, y and z places". All of that
stays exactly the same no matter what "U" is and no matter what type
"int[U]" is. Anything beyond that is not the parser's job.
Consider this:
double a;
bool b;
Two different variables that are treated very differently. But the
*structure* is exactly the same: {type} {identifier} {semicolon}. In all
cases, that means "variable declaration". And a variable declaration is a
statement, etc. For both variables, that structure is identical, and that's
all that the parser is concerned with. In a grammar-definition that would
look something like this:
<Var Decl> ::= <Type> <Ident> ';'
<Statement> ::= <Var Decl>
| any other type of statement here
| etc
Likewise, in the case of "int[U] s;" we have:
<Var Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
Note that everything about that is true no matter what the ident inside the
brackets is. Also note that there is *nothing* in there the pulls in any
information from the semantic phase. If it did, it wouldn't be context-free.
If it worked like this:
<Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
<Assoc Array Decl> ::= <Type> '[' <Ident> ']' <Ident> ';'
...*Then* it would no longer be context-free because the parser would have
to rely on the semantics of 'U' to determine how to parse it. But D doesn't
do that. It doesn't even bother distinguishing between array declarations
and assoc array declarations until the semantic phase. Therefore the grammar
(ie, the parsing) is still context-free.
One other thing that would prevent it from being context-free would be if
the left-side of a production rule had more than one token:
<A> <B> ::= etc...
> The fact that the compiler has to do a
> semantic analysis in order to figure this out (which was originally a
> syntax
> problem) really makes me wonder why you mention that D is still
> context-free.
>
It's not a syntax problem unless the grammar defines it to be a syntax
problem. Anything that's defined in the grammar is syntax; anything that
isn't defined in the grammar is not syntax. Also, remember that grammar only
apples to lexing/parsing and does not specify anything about semantics (and
"sementics" effectively means "anything that isn't handled by the lexer or
the parser").
More information about the Digitalmars-d-learn
mailing list