Let's stop parser Hell
Roman D. Boiko
rb at d-coding.com
Sat Jul 7 09:37:54 PDT 2012
On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:
> I added dstrings because
>
> 1- at the time (a few months ago), the lists here were awash in
> UTF-32
> discussions and I thought that'd be the way to go anyway
> 2- other D parsing libraries seemed to go to UTF32 also (CTPG)
> 3- I wanted to be able to parse mathematical notation like
> nabla,
> derivatives, etc. which all have UTF32 symbols.
I propose to switch code to use S if(isSomeString!S) everywhere.
Client code would first determine source encoding scheme, and
then instantiate parsers specifying a string type. This is not a
trivial change, but I'm willing to help implementing it.
> Note that PEG does not impose to use packrat parsing, even
> though it was developed to use it. I think it's a historical
> 'accident' that put the two together: Bryan Ford thesis used
> the two together.
>
> Note that many PEG parsers do not rely on packrat (Pegged does
> not).
> There are a bunch of articles on Bryan Ford's website by a guy
> writting a PEG parser for Java, and who found that storing the
> last rules was enought to get a slight speed improvement, buth
> that doing anymore sotrage was detrimental to the parser's
> overall efficiency.
That's great! Anyway I want to understand the advantages and
limitations of both Pegged and ANTLR, and probably study some
more techniques. Such research consumes a lot of time but can be
done incrementally along with development.
More information about the Digitalmars-d
mailing list