Improving std.regex(p)

BCS none at anon.com
Sat Jun 19 18:05:19 PDT 2010


Hello Ben,

> The only reason they were interested in my lexer generator is that
> recursive descent lexing is pretty slow. So for more demanding
> grammars, a DFA based lexer is better. The lexer library allows you to
> create a lexer at runtime (which doesn't fit in so well with their
> model of doing everything at compile time), or you can generate code
> offline and then just add that to your project. This is why a compile
> time DFA lexer would be really interesting to them.
> 
> The biggest problem (as far as I can
> see as an observer) is the compile times.

I've never got a usable grammer for D to feed into it but I did get one (~200 
productions) that at least compiled: it took 7 (or 17 minutes, I forget) 
and 700MB or ram To compile (P-III 550MHz, >700MB RAM)

> This is where D could be
> really interesting overall (Hartmut has certainly got his eye on it
> and in fact I'm sure all the Boost people do).
> 
> For my part, I'd like to see an LR parser generator developed. I'd be
> happy with one that creates the parser at runtime (assuming decent
> performance), but if it can generate and compile code efficiently at
> compile time, so much the better! :-) I really like the idea of being
> able to switch the runtime/compile time behaviour.
> 

Given that most LR parsers are table driven, all that would be needed for 
a compile time system would be that the table generator be evaluateable at 
compile time. The result could be stored in a constant. The only tricky bits 
then would be handling the data values on the stack and the part to tack 
in the action rules and that could just be a naming convention:

class MyActions
{
    resultType Action_statement_if(/+ args +/) { /+ code +/ }
    resultType Action_statement_ifThen(/+ args +/) { /+ code +/ }
    resultType Action_addExpression_plus(/+ args +/) { /+ code +/ }
    resultType Action_addExpression_minus(/+ args +/) { /+ code +/ }
    resultType Action_addExpression_chain(/+ args +/) { /+ code +/ }
    // ...


    mixin Parser!(
`statment : "if" "(" expression ")" statment / if 
          | "if" "(" expression ")" statment "then" statment / ifThen 
          | "{" statment* "}" / block
          | expression ";" /expression;
addExpression : .....
`);
}

Parser!(string) would generate a function that just assumes the existence 
of functions of the given names.

> When it comes to the code generation part for the DFA regex engine in
> D, I'd be happy to talk to you further about the techniques you've
> employed regarding emit. I'm still completely new to D, so it'll take
> me a while! :-)
> 
> Cheers,
> 
> Ben
> 
-- 
... <IXOYE><





More information about the Digitalmars-d mailing list