Anyone interested in a Spirit for D?

Wed Oct 18 17:11:43 PDT 2006

Pragma wrote:
> Bill Baxter wrote:
>> [snip]
>  >
>> Though, you know, even thinking about Boost::Spirit, I have to wonder 
>> if it really is necessary.  From the intro it says that it's primary 
>> use is "extremely small micro-parsers", not a full blown language 
>> processor. But if that's the target then the runtime overhead of 
>> translating the EBNF description to a parser would be pretty trivial.  
>> So I guess the real benefit of a compile-time parser-generator is that 
>> your grammar can be _verified_ at compile-time.
> 
>  From what I gather, that's the major benefit, other than a 
> "self-documenting design".  All the "prettyness" of using a near EBNF 
> syntax in C++ code gets you close enough to actual EBNF that it's 
> apparent what and how it functions.
> 
> However, the only problem with composing this as an EBNF compile-time 
> parser, is that you can't attach actions to arbitrary terminals without 
> some sort of binding lookup.  I'm not saying it's impossible, but it'll 
> be a little odd to use until we get some stronger reflection support.
> 
> But what you're suggesting could just as easily be a Compile-Time 
> rendition of Enki. It's quite possible to pull off.  Especially if you 
> digest the grammar one production at a time as to side-step any 
> recursion depth limitations when processing the parser templates. :)

Yes!  Sounds like we're thinking along the same lines here.  But if 
Walter's right, that the compile-time verification is not a big deal, 
then it would be even simpler.

Actually it sounds very similar to the way writing shader code for 
OpenGL/Direct3D works.  You have to compile the code it to use it, but 
conveniently compilation is so fast that you can do it at run-time 
easily.  Or if you prefer, you can still precompile it.  What I like to 
do is set up my IDE to go ahead and precompile my shaders just so I can 
check for errors at compile time, but then I use the runtime compilation 
in the end anyway because that makes some things easier -- like 
modifying the code on the fly.

It actually works pretty well I think.  The only difference between 
shader code and grammar code is that shader code doesn't need to make 
any callbacks.  But callbacks aren't hard.

> auto grammar = new Parser(
>   Production!("Number ::= NumberPart {NumberPart}",
>     // binding attached to production ('all' is supplied by default?)
>     void function(char[] all){
>       writefln("Parsed Number: %s",all);
>     }
>   ),
>   Production!("NumberPart ::= Sep | Digit "),
>   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
>   Production!("Sep ::= '_' | ','")
> );
> 
> // call specifying start production
> grammar.parse("Number",myInput);

That's one way to do it, but I think you could also allow bindings to be 
attached after the fact:

  auto grammar = new Parser(
      "Number ::= NumberPart {NumberPart}
       NumberPart ::= Sep | Digit
       Digit ::= 0|1|2|3|4|5|6|7|8|9
       Sep ::= '_' | ','");
    );

  grammer.attach("Number",
      // binding attached to production ('all' is supplied by default?)
      void function(char[] all){
        writefln("Parsed Number: %s",all);
      })

This is _exactly_ how parameter binding works in shader code.  Just here 
the value we're binding is a function pointer instead of a texture 
coordinate or something.

> Depending on how you'd like the call bindings to go, you could probably 
> go about as complex as what Enki lets you get away with.  But you'll 
> have to accept a 'soft' binding in there someplace, hence you loose the 
> type/name checking benefits of being at compile time.

I'll have to take your word for it.  You mean in Enki you can say that 
Number has to output something convertible to 'real'?

>> I wonder if it would be any easier to make a compile-time grammar 
>> verifier than a full blown parser generator?   Then just do the 
>> parser-generating at runtime.
> 
> Maybe I don't fully understand, but I don't think there's a gain there. 
>  If you've already gone through the gyrations of parsing the BNF 
> expression, it's hardly any extra trouble to do something at each step 
> of the resulting parse tree*.
> 
> (* of course template-based parsers use the call-tree as a parse-tree 
> but that's besides the point)

Yeh, I was just talking crap.  I thought maybe you might be able to save 
some bookkeeping if all you cared about was that the grammar made a 
valid tree, but didn't care about it's output.  But probably it's the 
other way around.  Checking validity is the hard part, not making a tree.

--bb