D parsing

Dmitry Olshansky dmitry.olsh at gmail.com
Tue Nov 5 11:25:31 PST 2013


05-Nov-2013 20:55, Philippe Sigaud пишет:
> On Tue, Nov 5, 2013 at 3:54 PM, Dmitry Olshansky <dmitry.olsh at gmail.com
> <mailto:dmitry.olsh at gmail.com>> wrote:
>
>
>     I was also toying with the idea of exposing Builder interface for
>     std.regex. But push/pop IMHO are better be implicitly designed-out:
>
>     auto re =
>     atom('x').star(charClass(__unicode.Letter),atom('y')).__build();
>
>     ... and letting the nesting be explicit.
>
>     Is the same as:
>     auto re = regex(`x(?:\p{L}y)*`);
>
>     Aimed for apps/libs that build regular expressions anyway and have
>     no need in textual parser.
>
> Another possible advantage is to reference external names inside your
> construction, thus naming other regexen or refencing external variables
> to deposit backreferences inside them.

Actually it's a bad, bad idea. It has nice potential to destroy all 
optimization opportunities and performance guarantees of it (like being 
linear in time, and that only works today w/o funky extensions used).

After all I'm in a curious position of having to do some work at R-T as 
well where you can't always just generate some D code ;)

What would be real nice though is to let users register their own 
dictionary of 'tokens' from that. Then things like Ipv4 pattern or 
domain name pattern as simple as `\d` pieces they use today (say with 
\i{user-defined-name}).

 > All in all, to get a regex
 > construct that can interact with the external word.

Well, I think of some rather interesting ways to do it even w/o tying in 
some external stuff as building blocks. It's rather making std.regex 
itself less rigid and more lean (as in cheap to invoke). Then external 
modules may slice and dice its primitives as seen fit.

>
>     What ANTLR does is similar technique - a regular lookahead to
>     resolve ambiguity in the grammar (implicitly). A lot like LL(k) but
>     with unlimited length (so called LL(*)). Of course, it generates
>     LL(k) disambiguation where possible, then LL(*), failing that the
>     usual backtracking.
>
> I liked that idea since the author added it to ANTLR, but I never used
> it since.
> I wonder whether that can be implemented inside another parser generator
> or if it uses some specific-to-ANTLR internal machinery.

I don't think there is much of specific in it. You would though have to 
accept it's no longer a PEG but rather some hybrid top-down EBNF parser 
that resolves ambiguities.

>         I worry that the greater threat to good AST manipulation tools
>         in D is a
>         lack of free time, and not the DMD bugs as much.
>
>
>     Good for you I guess, my developments in related area are blocked
>     still :(
>
> Walter is far from convinced that AST manipulation is a good thing. You
> would have to convince him first.

I thought it was about tools that work with D code like say lints, 
refactoring, etc.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list