Overlapping functionality: IFTI, templates, is-expressions
Russell Lewis
webmaster at villagersonline.com
Wed Mar 19 21:44:34 PDT 2008
BCS wrote:
>> or!(':',
>> Parse_block, /* parses a nonterminal */
>> chain!("import", array!(IDENT, ','), ';'),
>> ... many other alternativess ...)
>> (delegate void(... auto-generated arg types ...)
>> {
>> ... process successful parse here ...
>> });
>
> Errk!! Ow, Ow, ow. To much extra syntax! (for my taste) Mine works off a
> single string:
True! But what I'm attempting to do, at this point in the development,
is to develop an as-simple-as-I-can-get-it template parser library. As
I see it, the *next* step is to build a string parser which calls the
parser library. And, if I say so myself, I am, IMHO, getting pretty
close to "as dense as you can get without having to parse strings at
compile time."
Ofc, if you have a parser library which is functional even if it's a
little clunky, then hopefully you could use that library to scan input
strings, build parse trees, and then generate parsers from that. :)
So introduce me to your parser a bit. How do you handle things like:
* Single character tokens (matching the . operator in D syntax, for
instance)
* Multi-character tokens (matching D keywords)
* Lexer-recognized tokens (IDENT, NUM, CHAR_LITERAL, etc.)
Also, how good are you at handling ambiguous grammars?
How about any built-in way to handle repeated strings of elements, and
turn them into arrays?
Finally, how do you define the action-code?
As a little more background, I actually wrote a parser generator that
generated parsers from an extended-Bison syntax. It was slow, but
functioned, and the output that the parser produced were auto-generated
D structs and arrays that mirrored the parse tree. I still found it
hard to read the grammar code. Here's the first few lines of the
grammar for D:
BEGIN CODE (IN MY BISON-DERIVED GRAMMAR)
%token IDENT %{char*%}
%token NUM %{char*%}
%token STRING %{char*%}
%token CHAR %{char*%}
%token SHEBANG %{char*%}
module_df: // http://digitalmars.com/d/module.html
SHEBANG? ("module" IDENT_list=name ';')? /* HACK: *= is
broken */ (decl_def+=decl_defs)?
;
IDENT_list:
[IDENT=ident,'.']+=name
;
END CODE
Problem was, when I tried to write complex rules, all in one integrated
block, that things got hopelessly complex:
BEGIN CODE
class_declaration: // http://digitalmars.com/d/class.html
("auto"=is_auto|"scope"=is_scope)? "class" IDENT=name
(template_parms_decl=template_parms)? (':'
[("public"=is_public|"protected"=is_protected|"package"=is_package|"private"=is_private)?
type=type,',']+=super_interfaces)? '{' /* HACK: *= is broken */
(decl_def+=decl_defs)? '}'
;
END CODE
The nonterminal above will parse (nearly?) all of the class declarations
that you can find in the DMD sources, or in dstress...but can you read it?
I'm not so much drawing conclusions ("this sort of grammar can never
work") as looking for alternatives ("I wonder if..."). Tell me the
details about your design!
More information about the Digitalmars-d
mailing list