Looking for champion - std.lang.d.lex

Nick Sabalausky a at a.a
Sat Oct 23 23:27:51 PDT 2010


"Walter Bright" <newshound2 at digitalmars.com> wrote in message 
news:ia0cfv$22kp$1 at digitalmars.com...
> Nick Sabalausky wrote:
>> Would Walter & co be interested in this? If not, I won't bother, but if 
>> so, then I may give it a shot.
>
> The problem is I never have used parser/lexer generators, so I am not 
> really in a good position to review it.

Understandable.

FWIW though, Goldie isn't really lexer/parse generator per se. Traditional 
lexer/parser generators like lex/yacc or ANTLR will actually generate the 
source code for a lexer or parser. Goldie just has a single lexer and 
parser, both already pre-written. They're just completely data-driven:

Compared to the generators, Goldie's lexer is more like a general regex 
engine that simultaneously matches against multiple pre-compiled "regexes". 
By pre-compiled, I mean turned into a DFA - which is currently done by a 
separate non-source-available tool I didn't write, but I'm going to be 
writing my own version soon. By "regexes", I mean they're functionally 
regexes, but they're written in a much easier-to-read syntax than the 
typical PCRE.

Goldie's parser is really just a rather typical (from what I understand) 
LALR parser. I don't know how much you know about LALR's, but the parser 
itself is naturally grammar-independent (at least as described in CS texts). 
Using an LALR involves converting the grammar completely into a table of 
states and lookaheads (single-token lookahead; unlike LL, any more than that 
is never really needed), and then the actual parser is directed entirely by 
that table (much like how regexes are converted to data, ie DFA, and then 
processed generically), so it's completely grammar-independent.

And of, course, the actual lexer and parser can be 
optimized/rewritten/whatever with minimal impact on everything else.

If anyone's interested, further details are here(1):
http://www.devincook.com/goldparser/

Goldie does have optional code-generation capabilities, but it's entirely 
for the sake of providing a better statically-checked API tailored to your 
grammar (ex: to use D's type system to ensure at compile-time, instead of 
run-time, that token names are valid and that BNF rules you reference 
actually exist). It doesn't actually affect the lexer/parser in any 
non-trivial way.

(1): By that site's terminology, Goldie would technically be a "GOLD 
Engine", plus some additional tools. But, my current work on Goldie will cut 
that actual "GOLD Parser Builder" program completely out-of-the-loop (but it 
will still maintain compatibility with it for anyone who wants to use it).





More information about the Digitalmars-d mailing list