Goldie Parsing System v0.4 Released - Now for D2

Nick Sabalausky a at a.a
Sat Apr 16 00:40:04 PDT 2011


"Nick Sabalausky" <a at a.a> wrote in message 
news:ioanmi$82c$1 at digitalmars.com...
> Andrej Mitrovic Wrote:
>
>> What I meant was that code like this will throw if MyType isn't
>> defined anywhere:
>>
>> int main(int x)
>> {
>>     MyType var;
>> }
>>
>> goldie.exception.UnexpectedTokenException at src\goldie\exception.d(35):
>> test.c(3:12): Unexpected Id: 'var'
>>
>> It looks like valid C /syntax/, except that MyType isn't defined. But
>> this will work:
>> struct MyType {
>>        int field;
>> };
>> int main(int x)
>> {
>>     struct MyType var;
>> }
>>
>> So either Goldie or ParseAnything needs to have all types defined.
>> Maybe this is obvious, but I wouldn't know since I've never used a
>> parser before. :p
>>
>> Oddly enough, this one will throw:
>> typedef struct {
>>     int field;
>> } MyType;
>> int main(int x)
>> {
>>     MyType var;
>> }
>>
>> goldie.exception.UnexpectedTokenException at src\goldie\exception.d(35):
>> test.c(7:12): Unexpected Id: 'var'
>>
>> This one will throw as well:
>> struct SomeStruct {
>>     int field;
>> };
>> typedef struct SomeStruct MyType;
>> int main(int x)
>> {
>>     MyType var;
>> }
>>
>> goldie.exception.UnexpectedTokenException at src\goldie\exception.d(35):
>> test.c(13:12): Unexpected Id: 'myvar'
>>
>> Isn't typedef a part of ANSI C?
>
> I'm not at my computer right now, so I can't check, but it sounds like the 
> grammar follows the really old C-style of requiring structs to be declared 
> with "struct StructName varName". Apperently it doesn't take into account 
> the possibility of typedefs being used to eliminate that. When I get home, 
> I'll check, I think it may be an easy change to the grammar.
>

Yea, turns out that grammar just doesn't support using user-defined types 
without preceding them with "struct", "union", or "enum". You can see that 
here:

<Var Decl>     ::= <Mod> <Type> <Var> <Var List>  ';'
                 |       <Type> <Var> <Var List>  ';'
                 | <Mod>        <Var> <Var List>  ';'

<Mod>      ::= extern
             | static
             | register
             | auto
             | volatile
             | const

<Type>     ::= <Base> <Pointers>

<Base>     ::= <Sign> <Scalar>  ! Ie, the built-ins like char, signed int, 
etc...
             | struct Id
             | struct '{' <Struct Def> '}'
             | union Id
             | union '{' <Struct Def> '}'
             | enum Id

So when you use "MyType" instead of "struct MyType": It sees "MyType", 
assumes it's a variable since it doesn't match any of the <Type> forms 
above, and then barfs on "var" because "variable1 variable2" isn't valid C 
code. Normally, you'd just add another form to <Base> (Ie, add a line after 
"  | enum Id" that says "  | Id "). Except, the problem is...

C is notorious for types and variables being ambiguous with each other. So 
the distinction pretty much has to be done in the semantic phase (ie, 
outside of the formal grammar). But this grammar seems to be trying to make 
that distinction anyway. So trying to fix it by just simply adding a "<Base> 
::= Id" leads to ambiguity problems with types versus variables/expressions. 
That's probably why they didn't enhance the grammar that far - their 
"separation of type and variable" approach doesn't really work for C.

I'll have to think a bit on how best to adjust it. You can also check the 
GOLD mailing lists here to see if anyone has another C grammar:

http://www.devincook.com/goldparser/contact.htm





More information about the Digitalmars-d-announce mailing list