D parsing

Tue Nov 5 06:54:14 PST 2013

05-Nov-2013 10:36, Chad Joan пишет:
> On Friday, 1 November 2013 at 21:22:46 UTC, Andrei Alexandrescu wrote:
>> ...
>> Bugs stopping Pegged from going forward should receive high priority.
>> I also encourage others to participate to that and similar work.
>>
>>
>> Andrei
>
[snip]

> I was working on a parser-engine-as-a-library that could be used to as
> optimized internals for Pegged, as well as any other tools that need to
> recognize these common patterns.
>
> The idea was to expose an API like this:
>
>      string makeParser()
>      {
>          auto builder = new ParserBuilder!char;
>          builder.pushSequence();
>              builder.literal('x');
>              builder.pushMaybe();
>                  builder.literal('y');
>              builder.pop();
>          builder.pop();
>          return builder.toDCode("callMe");
>      }

I was also toying with the idea of exposing Builder interface for 
std.regex. But push/pop IMHO are better be implicitly designed-out:

auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();

... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and have no 
need in textual parser.

> That snippet would create a parser that recognizes the grammar 'x' (
> 'y'? ).
> The current fledgling implementation creates this parser:
> http://pastebin.com/MgSqWXE2
>
> Of course, no one would be expected to write grammars like that. It
> would be the job of tools like Pegged or std.regex to package it up in
> nice syntax that is easy to use.

I thought to provide some building blocks for that with new std.uni. Not 
quite everything I wanted, but now at least there is one set of wheels 
less to reinvent.

[snip]

> Another fun thought: PEGs can have look-behind that includes any regular
> elements without any additional algorithmic complexity. Just take all of
> the look-behinds in the grammar, mash them together into one big
> regular-expression using regular alternation (|), and then have the
> resulting automaton consume in lock-step with the PEG parser.  Whenever
> the PEG parser needs to do a lookbehind, it just checks to see if the
> companion automaton is in a matching state for the capture it needs.

Sounds quite slow to do it "just in case". Also complete DFAs tend to be 
mm quite big.

What ANTLR does is similar technique - a regular lookahead to resolve 
ambiguity in the grammar (implicitly). A lot like LL(k) but with 
unlimited length (so called LL(*)). Of course, it generates LL(k) 
disambiguation where possible, then LL(*), failing that the usual 
backtracking.

>
> *sigh*, I feel like I could write a paper on this stuff if I were in
> grad school right now.  Alas, I am stuck doing 50-60 hours a week of
> soul-sucking business programming.

I heard Sociomantic is hiring D programmers for coding some awesome 
stuff, you may as well apply :)

> Well, then again, my understanding
> is that even though I can think of things that seem like they would make
> interesting topics for publishable papers, reality would have the profs
> conscript me to do completely different things that are possibly just as
> inane as the business programming.
>
Speaking for my limited experience - at times it's like that.

> I worry that the greater threat to good AST manipulation tools in D is a
> lack of free time, and not the DMD bugs as much.

Good for you I guess, my developments in related area are blocked still :(

> I hope this is useful to someone!

-- 
Dmitry Olshansky