Pegged, From EBNF to PEG

Alex Rønne Petersen xtzgzorex at gmail.com
Tue Mar 13 10:05:46 PDT 2012


On 13-03-2012 17:17, Dmitry Olshansky wrote:
> On 12.03.2012 17:45, bls wrote:
>> On 03/13/2012 04:28 AM, Dmitry Olshansky wrote:
>>> On 12.03.2012 16:43, bls wrote:
>>>> On 03/10/2012 03:28 PM, Philippe Sigaud wrote:
>>>>> Hello,
>>>>>
>>>>> I created a new Github project, Pegged, a Parsing Expression Grammar
>>>>> (PEG) generator in D.
>>>>>
>>>>> https://github.com/PhilippeSigaud/Pegged
>>>>>
>>>>> docs: https://github.com/PhilippeSigaud/Pegged/wiki
>>>>
>>>> Just WOW!
>>>>
>>>> Nice to have on your WIKI would be a EBNF to PEG sheet.
>>>>
>>>> Wirth EBNF Pegged
>>>> A = BC. A <- B C
>>>> A = B|C. A <- C / C
>>>
>>> Maybe A <- B / C. And even then it's not exactly equivalent if the
>>> grammar was ambiguous.
>>> Imagine: B <- a, C <- aa
>> PEG is pretty new to me. Can you elaborate a bit ?
>
> PEG defines order of alternatives, that is pretty much like a top-down
> recursive descent parser would parse it. Alternatives are tried from
> left to right, if first one fails, it tries next and so on.
> In an example I give B is always picked first and so C is never ever
> looked at.
>
> Somewhat less artificial example:
> Literal <- IntL| FloatL
> FloatL <- [0-9]+(.[0-9]+)?
> IntL <- [0-9]+
>
> If you change it to: Literal <- FloatL| IntL then integer literals would
> get parsed as floating point.
>
>>>
>>
>> My mistake.. cleaned up stuff..
>>
>> Pegged Wirth EBNF
>>
>> Sequence
>> A <- B C A = BC.
>>
>> B or C
>> A <- B / C A = B|C.
>>
>> Zero or one B
>> A <- B? A = [B].
>>
>> Zero or more Bs
>> A <- B* A = {B}.
>>
>> One or more Bs
>> A <- B+ Not available
>>
>> PEG description of EBNF
>>
>> EBNF <- Procuction+
>> Production <- Identifier '=' Expression '.'
>> Expression <- Term ( '|' Term)*
>> Term <- Factor Factor*
>> Factor <- Identifier / Literal / '[' Expression ']' / '{' Expression '}'
>> / '(' Expression ')'
>> lowerCase <- [a-z]
>> upperCase <- [A-Z]
>> Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*
>
> Why not:
> Identifier <- [a-zA-Z]+

That was an illustrative example from the Pegged docs. But yeah, you 
should just use a range; reads nicer.

>
>> Literal <- ("'" .+ "'") / ('"' .+ '"')
>>
> This needs escaping. Plain '.+' in pattern asks for trouble 99% of time.
>
>> Still not sure if this is correct. Especially :
>> Term <- Factor Factor*
>>
>>
>> Another thing I never really understand is the "production" order, In
>> other words : Why not top down ..
>> Start :
>> lowerCase <- [a-z]
>> upperCase <- [A-Z]
>> Identifier <- (lowerCase / upperCase) (lowerCase / upperCase)*
>
>> ....
>>
>> End :
>> EBNF <- Procuction+
>>
>> where End is Root..
>
> In fact grammars are usually devised the other way around, e.g.
> Start:
> Program -> ...
> Ehm... what the whole program is exactly ? Ok, let it be Declaration*
> for now. What kind of declarations do we have? ... and so on. Latter
> grammars get tweaked and extended numerous times.
>
> At any rate production order has no effect on the grammar, it's still
> the same. The only thing of importance is what non-terminal considered
> final (or start if you are LL-centric).
>
>>
>> TIA, Bjoern
>
>


-- 
- Alex


More information about the Digitalmars-d-announce mailing list