D parsing

Chad Joan chadjoan at gmail.com
Wed Nov 6 00:08:17 PST 2013


On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshansky 
wrote:
>
> I was also toying with the idea of exposing Builder interface 
> for std.regex. But push/pop IMHO are better be implicitly 
> designed-out:
>
> auto re = 
> atom('x').star(charClass(unicode.Letter),atom('y')).build();
>
> ... and letting the nesting be explicit.
>
> Is the same as:
> auto re = regex(`x(?:\p{L}y)*`);
>
> Aimed for apps/libs that build regular expressions anyway and 
> have no need in textual parser.
>

Interesting.  I like how it induces some amount of static 
verification, though I worry that it could harm procedural 
generation of grammars.  It would be difficult, for instance, to 
use that API to do the equivalent of pushing an atom in one 
function and popping it in another.

I wonder if we are at different levels of abstraction.  The 
example you give seems like it requires the API to remember, in a 
structured way, all of the information presented by the 
call-chaining and call-nesting.  I might implement something like 
that with a stateful "builder" object under the hood.  However, 
the example you give seems like it is closer to what a regex 
engine would morph an expression into, thus making it a higher 
abstraction.

>> That snippet would create a parser that recognizes the grammar 
>> 'x' (
>> 'y'? ).
>> The current fledgling implementation creates this parser:
>> http://pastebin.com/MgSqWXE2
>>
>> Of course, no one would be expected to write grammars like 
>> that. It
>> would be the job of tools like Pegged or std.regex to package 
>> it up in
>> nice syntax that is easy to use.
>
> I thought to provide some building blocks for that with new 
> std.uni. Not quite everything I wanted, but now at least there 
> is one set of wheels less to reinvent.
>

I haven't looked at std.uni earnestly yet, but if it succeeds at 
making that unicode/utf jungle manageable, then I will be 
incredibly thankful.

> [snip]
>
>> Another fun thought: PEGs can have look-behind that includes 
>> any regular
>> elements without any additional algorithmic complexity. Just 
>> take all of
>> the look-behinds in the grammar, mash them together into one 
>> big
>> regular-expression using regular alternation (|), and then 
>> have the
>> resulting automaton consume in lock-step with the PEG parser.  
>> Whenever
>> the PEG parser needs to do a lookbehind, it just checks to see 
>> if the
>> companion automaton is in a matching state for the capture it 
>> needs.
>
> Sounds quite slow to do it "just in case". Also complete DFAs 
> tend to be mm quite big.
>

I was envisioning it being done lazily and strictly as-needed, if 
I even got around to doing it at all.

> What ANTLR does is similar technique - a regular lookahead to 
> resolve ambiguity in the grammar (implicitly). A lot like LL(k) 
> but with unlimited length (so called LL(*)). Of course, it 
> generates LL(k) disambiguation where possible, then LL(*), 
> failing that the usual backtracking.
>

Neat.

>>
>> *sigh*, I feel like I could write a paper on this stuff if I 
>> were in
>> grad school right now.  Alas, I am stuck doing 50-60 hours a 
>> week of
>> soul-sucking business programming.
>
> I heard Sociomantic is hiring D programmers for coding some 
> awesome stuff, you may as well apply :)
>

Tempting.

And it seems even Facebook is joining the market now, which is 
news to me.

>> Well, then again, my understanding
>> is that even though I can think of things that seem like they 
>> would make
>> interesting topics for publishable papers, reality would have 
>> the profs
>> conscript me to do completely different things that are 
>> possibly just as
>> inane as the business programming.
>>
> Speaking for my limited experience - at times it's like that.
>

Yay, someone's observations corroborate mine!
...
Crap, someone observations corroborate mine :(

;)

>> I worry that the greater threat to good AST manipulation tools 
>> in D is a
>> lack of free time, and not the DMD bugs as much.
>
> Good for you I guess, my developments in related area are 
> blocked still :(
>

Well, hopefully you've got the wind at your back now.


More information about the Digitalmars-d mailing list