Goldie Parsing System v0.5 - Speed
Nick Sabalausky
a at a.a
Wed May 18 14:29:09 PDT 2011
"Stephan" <spam at extrawurst.org> wrote in message
news:ir05te$tbd$1 at digitalmars.com...
> On 18.05.2011 05:47, Nick Sabalausky wrote:
>> Goldie Parsing System v0.5 is now out. This version focuses mainly on
>> speed
>> improvements.
>>
>
> Great work.
>
Thanks :)
> Is it possible to generate a parser for D with this ?
>
It should be possible to write a grammar that handles most of D. But there
would be some awkwardness and corner cases that, to really be handled right,
would need some enhancements I haven't put in yet.
For example:
- Nested comments aren't yet officially supported. GOLD (which Goldie is
based on) will support them in the currently-in-beta v4.2 (
http://www.devincook.com/goldparser/v4.2.htm ). I intend to make Goldie
fully compatible with all the new GOLD v4.2 features, but just haven't
gotten to them yet. In the meantime, what you can do is lex the D source
first, then go through the resulting token array removing everything from a
"/+" token to its matching "+/" token (there will be a bunch of junk in
between, including some error tokens, you can just rip it all out), and then
send that through the parser.
- Another comment-related thing that'll be fixed with the v4.2 enhancements:
Currently, GOLD and Goldie handle (non-nested) block comments by actually
lexing what's inside the comment (and ignoring any errors). Normally this
works out fine, but it does lead to some occasional edge-cases where the
"*/" isn't handled right.
- D relies on certain disambiguation rules. For instance: "a*b" could be
either a multiplication expression or a pointer declaration. D handles this
by saying "if something can be either an expression or a declaration, then
always interpret it as (umm...actually I forget which one it always chooses,
but it's always that same one)". Goldie (and GOLD) currently doesn't have
any conflict resolution. If you try to create a grammar that has such an
ambiguity, you'll just get a "reduce-reduce conflict" error, or
"shift-reduce" problems. The way to work around this is to design the
grammar to completely conflate the two notions, so instead of having
<Expression> and <Declaration>, you'd just have something like <ExprOrDecl>.
Unfortunately, this isn't always easy, it does tend to obfuscate the
grammar, it makes the nonterminals less meaningful, and it'll create much
more work for your semantics pass. I do intend to solve this, but it'll
probably be a very non-trival matter. More discussion (possibly a bit
technical) on this issue is here:
http://groups.google.com/group/gold-parsing-system/browse_thread/thread/5959e0cfef76ce68
FWIW, Goldie does include a lex-only grammar for D2, which could be used as
a starting point (although it's possible I might have gotten some edge cases
wrong regarding the decimal literals. Also, this grammar is currently
ASCII-only, but that can easily be changed):
http://www.dsource.org/projects/goldie/browser/tags/v0.5/lang/dlex.grm
More information about the Digitalmars-d-announce
mailing list