Overlapping functionality: IFTI, templates, is-expressions

Russell Lewis webmaster at villagersonline.com
Wed Mar 19 21:44:34 PDT 2008


BCS wrote:
>> or!(':',
>> Parse_block,  /* parses a nonterminal */
>> chain!("import", array!(IDENT, ','), ';'),
>> ... many other alternativess ...)
>> (delegate void(... auto-generated arg types ...)
>> {
>> ... process successful parse here ...
>> });
> 
> Errk!! Ow, Ow, ow. To much extra syntax! (for my taste) Mine works off a 
> single string:

True!  But what I'm attempting to do, at this point in the development, 
is to develop an as-simple-as-I-can-get-it template parser library.  As 
I see it, the *next* step is to build a string parser which calls the 
parser library.  And, if I say so myself, I am, IMHO, getting pretty 
close to "as dense as you can get without having to parse strings at 
compile time."

Ofc, if you have a parser library which is functional even if it's a 
little clunky, then hopefully you could use that library to scan input 
strings, build parse trees, and then generate parsers from that. :)



So introduce me to your parser a bit.  How do you handle things like:
* Single character tokens (matching the . operator in D syntax, for
   instance)
* Multi-character tokens (matching D keywords)
* Lexer-recognized tokens (IDENT, NUM, CHAR_LITERAL, etc.)

Also, how good are you at handling ambiguous grammars?

How about any built-in way to handle repeated strings of elements, and 
turn them into arrays?

Finally, how do you define the action-code?



As a little more background, I actually wrote a parser generator that 
generated parsers from an extended-Bison syntax.  It was slow, but 
functioned, and the output that the parser produced were auto-generated 
D structs and arrays that mirrored the parse tree.  I still found it 
hard to read the grammar code.  Here's the first few lines of the 
grammar for D:

BEGIN CODE (IN MY BISON-DERIVED GRAMMAR)
   %token IDENT   %{char*%}
   %token NUM     %{char*%}
   %token STRING  %{char*%}
   %token CHAR    %{char*%}
   %token SHEBANG %{char*%}

   module_df: // http://digitalmars.com/d/module.html
           SHEBANG? ("module" IDENT_list=name ';')? /* HACK: *= is 
broken */ (decl_def+=decl_defs)?
   ;

   IDENT_list:
           [IDENT=ident,'.']+=name
   ;
END CODE

Problem was, when I tried to write complex rules, all in one integrated 
block, that things got hopelessly complex:

BEGIN CODE
class_declaration: // http://digitalmars.com/d/class.html
           ("auto"=is_auto|"scope"=is_scope)? "class" IDENT=name 
(template_parms_decl=template_parms)? (':' 
[("public"=is_public|"protected"=is_protected|"package"=is_package|"private"=is_private)? 
type=type,',']+=super_interfaces)? '{' /* HACK: *= is broken */ 
(decl_def+=decl_defs)? '}'
;
END CODE

The nonterminal above will parse (nearly?) all of the class declarations 
that you can find in the DMD sources, or in dstress...but can you read it?

I'm not so much drawing conclusions ("this sort of grammar can never 
work") as looking for alternatives ("I wonder if...").  Tell me the 
details about your design!



More information about the Digitalmars-d mailing list