Pegged: Syntax Highlighting

Philippe Sigaud philippe.sigaud at gmail.com
Sat Mar 17 13:53:30 PDT 2012


On Sat, Mar 17, 2012 at 18:11, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:

>> The D grammar is a 1000-line / hundreds of rules monster. I finished
>> writing it and am now crushing bugs.
>> God, that generates a 10_000 line module to parse it. I should
>> simplify the code generator somewhat.
>
>
> Science is done. Welcome to implementation :o).

Hey, it's only 3.000 lines now :) Coming from a thousand-lines
grammar, it's not that much an inflation.


> I can't say how excited I am about this direction. I have this vision of
> having a D grammar published on the website that is actually "it", i.e. the
> same exact grammar is used by a validator that goes through all of our test
> suite. (The validator wouldn't do any semantic checking.) The parser
> generator _and_ the reference D grammar would be available in Phobos, so for
> anyone it would be dirt cheap to parse some D code and wander through the
> generated AST. The availability of a reference grammar and parser would be
> golden to a variety of D toolchain creators.

Indeed, but I fear the D grammar is a bit too complex to be easily
walked. Now that I read it, I realize that '1' is parsed as a
10-levels deep leaf!
Compared to lisp, it's... not in the same league, to say the least. I
will see to drastically simplify the parse tree.

Does anyone have experience with other languages similar to D and that
offer AST-walking? Doesn't C# have something like this?
(I'll have a look at Scala macros)

> Just to gauge interest:
>
> 1. Would you consider submitting your work to Phobos?

Yes, of course. It's already Boost-licensed.
Seeing the review processes for other modules, it'd most certainly put
the code in great shape. But then, it's far from being submittable
right now.


> 2. Do you think your approach can generate parsers competitive with
> hand-written ones? If not, why?

Right now, no, if only because I didn't take any step in making it
fast or in limiting its RAM consumption.
After applying some ideas I have, I don't know. There are many people
here that are parser-aware and could help make the code faster. But at
the core, to allow mutually recursive rules, the design use classes:

class A : someParserCombinationThatMayUseA { ... }

Which means A.parse (a static method) is just typeof(super).parse
(also static, and so on). Does that entail any crippling disadvantage
compared to hand-written parser?


Philippe


More information about the Digitalmars-d-announce mailing list