Let's stop parser Hell

Sat Jul 7 09:37:54 PDT 2012

On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:
> I added dstrings because
>
> 1- at the time (a few months ago), the lists here were awash in 
> UTF-32
> discussions and I thought that'd be the way to go anyway
> 2- other D parsing libraries seemed to go to UTF32 also (CTPG)
> 3- I wanted to be able to parse mathematical notation like 
> nabla,
> derivatives, etc. which all have UTF32 symbols.

I propose to switch code to use S if(isSomeString!S) everywhere. 
Client code would first determine source encoding scheme, and 
then instantiate parsers specifying a string type. This is not a 
trivial change, but I'm willing to help implementing it.

> Note that PEG does not impose to use packrat parsing, even 
> though it was developed to use it. I think it's a historical 
> 'accident' that put the two together: Bryan Ford thesis used 
> the two together.
>
> Note that many PEG parsers do not rely on packrat (Pegged does 
> not).
> There are a bunch of articles on Bryan Ford's website by a guy
> writting a PEG parser for Java, and who found that storing the 
> last rules was enought to get a slight speed improvement, buth 
> that doing anymore sotrage was detrimental to the parser's 
> overall efficiency.

That's great! Anyway I want to understand the advantages and 
limitations of both Pegged and ANTLR, and probably study some 
more techniques. Such research consumes a lot of time but can be 
done incrementally along with development.