Looking for champion - std.lang.d.lex
Nick Sabalausky
a at a.a
Sat Oct 23 23:27:51 PDT 2010
"Walter Bright" <newshound2 at digitalmars.com> wrote in message
news:ia0cfv$22kp$1 at digitalmars.com...
> Nick Sabalausky wrote:
>> Would Walter & co be interested in this? If not, I won't bother, but if
>> so, then I may give it a shot.
>
> The problem is I never have used parser/lexer generators, so I am not
> really in a good position to review it.
Understandable.
FWIW though, Goldie isn't really lexer/parse generator per se. Traditional
lexer/parser generators like lex/yacc or ANTLR will actually generate the
source code for a lexer or parser. Goldie just has a single lexer and
parser, both already pre-written. They're just completely data-driven:
Compared to the generators, Goldie's lexer is more like a general regex
engine that simultaneously matches against multiple pre-compiled "regexes".
By pre-compiled, I mean turned into a DFA - which is currently done by a
separate non-source-available tool I didn't write, but I'm going to be
writing my own version soon. By "regexes", I mean they're functionally
regexes, but they're written in a much easier-to-read syntax than the
typical PCRE.
Goldie's parser is really just a rather typical (from what I understand)
LALR parser. I don't know how much you know about LALR's, but the parser
itself is naturally grammar-independent (at least as described in CS texts).
Using an LALR involves converting the grammar completely into a table of
states and lookaheads (single-token lookahead; unlike LL, any more than that
is never really needed), and then the actual parser is directed entirely by
that table (much like how regexes are converted to data, ie DFA, and then
processed generically), so it's completely grammar-independent.
And of, course, the actual lexer and parser can be
optimized/rewritten/whatever with minimal impact on everything else.
If anyone's interested, further details are here(1):
http://www.devincook.com/goldparser/
Goldie does have optional code-generation capabilities, but it's entirely
for the sake of providing a better statically-checked API tailored to your
grammar (ex: to use D's type system to ensure at compile-time, instead of
run-time, that token names are valid and that BNF rules you reference
actually exist). It doesn't actually affect the lexer/parser in any
non-trivial way.
(1): By that site's terminology, Goldie would technically be a "GOLD
Engine", plus some additional tools. But, my current work on Goldie will cut
that actual "GOLD Parser Builder" program completely out-of-the-loop (but it
will still maintain compatibility with it for anyone who wants to use it).
More information about the Digitalmars-d
mailing list