Pegged, From EBNF to PEG

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Mar 17 07:48:41 PDT 2012


On 17.03.2012 18:11, Philippe Sigaud wrote:
> On Sat, Mar 17, 2012 at 10:09, Dmitry Olshansky<dmitry.olsh at gmail.com>  wrote:
>
>> Ok, let's agree on fact that semantically
>> a* is :
>> As<- a As / a
>>
>> and a*? is this:
>> As<- a / a As
>>
>> Now that's local ;)
>
> It's local, yes. But the pb is with   Expr<- A* B C D, when D fails.
> PEG sequences don't backtrack.

I'd argue they do. As I see it as:
Expr <- As B C D / B C D
As <- A / A As
(or use an epsilon production for As, is it allowed in pegged ?)

>
>
>>> I remember trying compile-time backtracking in another similar library
>>> in D 1-2 years ago and getting lots of pb. I might add that in Pegged,
>>> but I don't know the consequences. How do you do that in std.regex?
>>
>>
>>
>> There are two fundamental ways to get around the problem
> (snip)
>
> Thanks for the explanations. Does std.regex implement this?
>

It does the second way that I haven't told about, and had hard time to 
implement in C-T :)
And the simple version of what I presented (i.e. stack stuff) is used 
in CT-regex and as fallback in R-T. The problem with doing a bitmap 
memoization is that regex often used to _search_ things in large inputs. 
Some compromise and/or thresholds have to be estimated and found.
Parsing on the contrary guaranties that the whole input is used or in 
error, hence bitmap is not wasted.

>
>> Now if we put aside all crap talk about mathematical purity and monads, and
>> you name it, a packrat is essentially this, a dynamic programming trick
>> applied to recursive top-down parsing. The trick could be extended even more
>> to avoid extra checks on input characters, but the essence is this
>> "memoization".
>
> I plan to store indices anyway. I might add this in the future. A read
> something on using PEGs to parse Java and C and there was no need to
> do a complete memoization: storing the last parse result as a
> temporary seemed to be enough.
>

Well even straight no-memorization is somehow enough, it's just a way to 
ensure no extra work is done. C & Java are not much of backtracky grammars.

>>> (nice article btw, I learnt some about regexes)
>>
>>
>> Thanks, I hope it makes them more popular.
>> Might as well keep me busy fixing bugs :)
>
> As people said on Reddit, you should make more whooping sounds about
> CT-regex. That was what wowed me and pushed me to tackle CT-parsing
> again.

It's just I'm also well aware of how much luck it (still) takes to fly ;)
The asserts that compare C-T vs R-T go off too often for my taste, other 
then this the memory usage during compile is too high.
I recall once dmd had GC re-enabled for brief period, dmd used no more 
then ~64Mb doing real crazy ct-regex stuff.

OK, back to C-T parsing, I have one crazy idea that I can't get away 
from - add operator precedence grammar into the mix. From what I observe 
it should integrate into PEG smoothly. Then it would make 
"military-grade" hybrid that uses operator precedence parsing of 
expressions and the like. Few more tricks and it may beat some
existing parser generators.

See this post, where I tried to describe that idea early on:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=159562

I might catch spare time to go about adding it myself, the only tricky 
thing is to embed plain semantic actions, as AST generation would be 
more or less the same.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-announce mailing list