Anyone interested in a Spirit for D?

Wed Oct 18 18:13:19 PDT 2006

Bill Baxter wrote:
> Pragma wrote:
>> Bill Baxter wrote:
>>> [snip]
>>  >
>>> Though, you know, even thinking about Boost::Spirit, I have to wonder 
>>> if it really is necessary.  From the intro it says that it's primary 
>>> use is "extremely small micro-parsers", not a full blown language 
>>> processor. But if that's the target then the runtime overhead of 
>>> translating the EBNF description to a parser would be pretty 
>>> trivial.  So I guess the real benefit of a compile-time 
>>> parser-generator is that your grammar can be _verified_ at compile-time.
>>
>>  From what I gather, that's the major benefit, other than a 
>> "self-documenting design".  All the "prettyness" of using a near EBNF 
>> syntax in C++ code gets you close enough to actual EBNF that it's 
>> apparent what and how it functions.
>>
>> However, the only problem with composing this as an EBNF compile-time 
>> parser, is that you can't attach actions to arbitrary terminals 
>> without some sort of binding lookup.  I'm not saying it's impossible, 
>> but it'll be a little odd to use until we get some stronger reflection 
>> support.
>>
>> But what you're suggesting could just as easily be a Compile-Time 
>> rendition of Enki. It's quite possible to pull off.  Especially if you 
>> digest the grammar one production at a time as to side-step any 
>> recursion depth limitations when processing the parser templates. :)
> 
> Yes!  Sounds like we're thinking along the same lines here.  But if 
> Walter's right, that the compile-time verification is not a big deal, 
> then it would be even simpler.
> 
> Actually it sounds very similar to the way writing shader code for 
> OpenGL/Direct3D works.  You have to compile the code it to use it, but 
> conveniently compilation is so fast that you can do it at run-time 
> easily.  Or if you prefer, you can still precompile it.  What I like to 
> do is set up my IDE to go ahead and precompile my shaders just so I can 
> check for errors at compile time, but then I use the runtime compilation 
> in the end anyway because that makes some things easier -- like 
> modifying the code on the fly.
> 
> It actually works pretty well I think.  The only difference between 
> shader code and grammar code is that shader code doesn't need to make 
> any callbacks.  But callbacks aren't hard.
> 
>> auto grammar = new Parser(
>>   Production!("Number ::= NumberPart {NumberPart}",
>>     // binding attached to production ('all' is supplied by default?)
>>     void function(char[] all){
>>       writefln("Parsed Number: %s",all);
>>     }
>>   ),
>>   Production!("NumberPart ::= Sep | Digit "),
>>   Production!("Digit ::= 0|1|2|3|4|5|6|7|8|9"),
>>   Production!("Sep ::= '_' | ','")
>> );
>>
>> // call specifying start production
>> grammar.parse("Number",myInput);
> 
> That's one way to do it, but I think you could also allow bindings to be 
> attached after the fact:
> 
>  auto grammar = new Parser(
>      "Number ::= NumberPart {NumberPart}
>       NumberPart ::= Sep | Digit
>       Digit ::= 0|1|2|3|4|5|6|7|8|9
>       Sep ::= '_' | ','");
>    );
> 
>  grammer.attach("Number",
>      // binding attached to production ('all' is supplied by default?)
>      void function(char[] all){
>        writefln("Parsed Number: %s",all);
>      })
> 
> This is _exactly_ how parameter binding works in shader code.  Just here 
> the value we're binding is a function pointer instead of a texture 
> coordinate or something.
> 
>> Depending on how you'd like the call bindings to go, you could 
>> probably go about as complex as what Enki lets you get away with.  But 
>> you'll have to accept a 'soft' binding in there someplace, hence you 
>> loose the type/name checking benefits of being at compile time.
> 
> I'll have to take your word for it.  You mean in Enki you can say that 
> Number has to output something convertible to 'real'?

Yes and no.  The parser generator has a good deal of flexibility 
built-in, including a pseudo-variant type that tries to perform 
conversions wherever possible.  For instance, if we re-wrote the 
production for Number like so:

Number
   = real handleNumber(whole,part)
   ::= (NumberPart {NumberPart}):whole '.'
       (NumberPart {NumberPart}):fraction;

... Enki would emit code that binds the chars traversed for for 'whole' 
and 'fraction', and passes those onto a function called 'handleNumber' 
that returns a real.  That return value is passed up parse chain so that 
other terminals can bind to it:

Foobar = writeMe(foo) ::= Number:foo;

And so on.

> 
>>> I wonder if it would be any easier to make a compile-time grammar 
>>> verifier than a full blown parser generator?   Then just do the 
>>> parser-generating at runtime.
>>
>> Maybe I don't fully understand, but I don't think there's a gain 
>> there.  If you've already gone through the gyrations of parsing the 
>> BNF expression, it's hardly any extra trouble to do something at each 
>> step of the resulting parse tree*.
>>
>> (* of course template-based parsers use the call-tree as a parse-tree 
>> but that's besides the point)
> 
> Yeh, I was just talking crap.  I thought maybe you might be able to save 
> some bookkeeping if all you cared about was that the grammar made a 
> valid tree, but didn't care about it's output.  But probably it's the 
> other way around.  Checking validity is the hard part, not making a tree.
> 
> --bb