Let's stop parser Hell

Sat Jul 7 16:31:16 PDT 2012

On 07/07/2012 04:26 PM, David Piepgrass wrote:
>> auto captures = syntaxNode.matchNodes(
>> TOK_WHILE_NODE,
>> OP_ENTER_NODE,
>> OP_CAPTURE(0),
>> OP_BEGIN,
>> TOK_EXPRESSION,
>> OP_END,
>> OP_CAPTURE(1),
>> OP_BEGIN,
>> TOK_STATEMENT,
>> OP_END,
>> OP_LEAVE_NODE);
>
> I'm glad to hear you like the tree-parsing approach, Chad, although the
> particular syntax here looks pretty unfriendly :O -- does this represent
> something that you are working on right now?
>

Yes and yes.

I didn't choose this because it because it's pretty.

I chose it because:
(1) It's easy to implement.
(2) Both the implementation and syntax can be altered easily.

I do not have time to write a language for the tree pattern recognition 
and substitution that is needed to do this in an aesthetically pleasing 
way.  I've tried to sketch what it might look like before, and even then 
it is hard to make it nice, much less begin implementing the thing.  I'd 
love to have such a language, but resource constraints exist.

I also think that this approach would allow me to find out what my usage 
patterns look like before I commit to a more complicated 
architecture/tool.  I really think the design of this regex/language/DSL 
thing should be dominated by its usage.  This is a tricky 
chicken-and-egg thing because it's not currently used.  The hacky syntax 
you see is the bootstrapping.

Point (2) is important because, since we don't have existing usage 
patterns, this thing is going to change.  It's going to change /a lot/. 
  I want it to be easy to change.  I think a complete DSL will be harder 
to change quickly.

I also like how it doesn't require a lot of CTFE trickery or pushing DMD 
too far.  D has really cool features, but I find that when I use things 
like CTFE aggressively then I lose productivity because I end up 
spending a lot of time finding compiler bugs.  This leads to my current 
strategy: use the simpler features that work for sure, and only use the 
more advanced stuff when I really need to.  I think my syntax fits this 
strategy and thus contributes to point (1).

That said, it is good that even mostly-working CTFE exists and that a 
powerful template and metaprogramming system exists, because I don't 
think a compiler like this would be very practical to program otherwise. 
  It would be doable in other languages, but could easily suffer from 
performance pessimizations due to being forced to compute everything at 
runtime.

If anyone has an approach that shares the above strengths and looks 
nicer or is more powerful, I'd love to see it.

>
> 5. It's risky 'cause I've never heard of anyone taking this approach
> before. Bring on the danger!
>

The danger is the fun part! <g>

>
>> I wanted to make such a front-end so that I could easily make a C
>> backend. I believe such a compiler would be able to do that with great
>> ease.
>>
>> Needing to use D in places where it isn't available is a real
>> pain-point for me right now, and I'll probably find ways to spend time
>> on it eventually.
>
> Yeah, with a tree-transforming parser, I imagine the same thing, except
> my current fetish is to convert a certain subset of D to multiple other
> languages automatically. Then I could write libraries that can easily be
> used by an astonishingly large audience. I certainly would like to see D
> targetting Android, but that's best done directly from D to ARM.
>

That does sound very cool.  Possibly difficult though, due to having to 
cater to the lowest-common-denominator in all of your API designs.  No 
templated functions or ranges in your API, that's for sure.  I'm sure 
there are some things where this is very doable though; it probably 
depends on what kind of libraries you are writing.

As for D targeting Android, my intent is really to target X where X is 
any CPU/OS combo you can think of.  I want to be able to get D, the 
language, not necessarily phobos or other niceties, to work on any 
platform, and to do so without much work.  Cross-compiling to a new 
platform that has never been cross-compiled before should require zero 
coding.  Perhaps it might be possible to have a text file with some 
key-value configuration that tells it certain common features are 
available on the target, thus allowing you to have more features with 
almost no effort involved.

Still, I'll always take a crippled but existent D compiler that targets 
Android over a perfect but non-existent D compiler that targets Android.

I think that the D-directly-to-ARM is the current approach for 
cross-compiling.  I critique it for its underwhelming lack of results.

> Anyway, the devil's in the detail. Originally I wanted to do a parser
> generator and a "completely general AST" in C# and couldn't seem to work
> out the details, but D is more flexible and is likely better suited to
> the task.

I can easily see how this is the case.  I don't think I'd be interested 
in doing a project like this in any other language.  I imagined trying 
to do something like this in C or Java or C# and it just doesn't seem 
practical.  For instance, I don't think the "use regular expressions to 
match AST structures" would work well in other cases because it would 
either (1) have a bunch of runtime overhead for compiling the 
expressions into DFAs at runtime or (2) suffer from integration problems 
if you try to compile the expressions in separate files before compiling 
the rest of the front-end.