Let's stop parser Hell
Chad J
chadjoan at __spam.is.bad__gmail.com
Sat Jul 7 16:31:16 PDT 2012
On 07/07/2012 04:26 PM, David Piepgrass wrote:
>> auto captures = syntaxNode.matchNodes(
>> TOK_WHILE_NODE,
>> OP_ENTER_NODE,
>> OP_CAPTURE(0),
>> OP_BEGIN,
>> TOK_EXPRESSION,
>> OP_END,
>> OP_CAPTURE(1),
>> OP_BEGIN,
>> TOK_STATEMENT,
>> OP_END,
>> OP_LEAVE_NODE);
>
> I'm glad to hear you like the tree-parsing approach, Chad, although the
> particular syntax here looks pretty unfriendly :O -- does this represent
> something that you are working on right now?
>
Yes and yes.
I didn't choose this because it because it's pretty.
I chose it because:
(1) It's easy to implement.
(2) Both the implementation and syntax can be altered easily.
I do not have time to write a language for the tree pattern recognition
and substitution that is needed to do this in an aesthetically pleasing
way. I've tried to sketch what it might look like before, and even then
it is hard to make it nice, much less begin implementing the thing. I'd
love to have such a language, but resource constraints exist.
I also think that this approach would allow me to find out what my usage
patterns look like before I commit to a more complicated
architecture/tool. I really think the design of this regex/language/DSL
thing should be dominated by its usage. This is a tricky
chicken-and-egg thing because it's not currently used. The hacky syntax
you see is the bootstrapping.
Point (2) is important because, since we don't have existing usage
patterns, this thing is going to change. It's going to change /a lot/.
I want it to be easy to change. I think a complete DSL will be harder
to change quickly.
I also like how it doesn't require a lot of CTFE trickery or pushing DMD
too far. D has really cool features, but I find that when I use things
like CTFE aggressively then I lose productivity because I end up
spending a lot of time finding compiler bugs. This leads to my current
strategy: use the simpler features that work for sure, and only use the
more advanced stuff when I really need to. I think my syntax fits this
strategy and thus contributes to point (1).
That said, it is good that even mostly-working CTFE exists and that a
powerful template and metaprogramming system exists, because I don't
think a compiler like this would be very practical to program otherwise.
It would be doable in other languages, but could easily suffer from
performance pessimizations due to being forced to compute everything at
runtime.
If anyone has an approach that shares the above strengths and looks
nicer or is more powerful, I'd love to see it.
>
> 5. It's risky 'cause I've never heard of anyone taking this approach
> before. Bring on the danger!
>
The danger is the fun part! <g>
>
>> I wanted to make such a front-end so that I could easily make a C
>> backend. I believe such a compiler would be able to do that with great
>> ease.
>>
>> Needing to use D in places where it isn't available is a real
>> pain-point for me right now, and I'll probably find ways to spend time
>> on it eventually.
>
> Yeah, with a tree-transforming parser, I imagine the same thing, except
> my current fetish is to convert a certain subset of D to multiple other
> languages automatically. Then I could write libraries that can easily be
> used by an astonishingly large audience. I certainly would like to see D
> targetting Android, but that's best done directly from D to ARM.
>
That does sound very cool. Possibly difficult though, due to having to
cater to the lowest-common-denominator in all of your API designs. No
templated functions or ranges in your API, that's for sure. I'm sure
there are some things where this is very doable though; it probably
depends on what kind of libraries you are writing.
As for D targeting Android, my intent is really to target X where X is
any CPU/OS combo you can think of. I want to be able to get D, the
language, not necessarily phobos or other niceties, to work on any
platform, and to do so without much work. Cross-compiling to a new
platform that has never been cross-compiled before should require zero
coding. Perhaps it might be possible to have a text file with some
key-value configuration that tells it certain common features are
available on the target, thus allowing you to have more features with
almost no effort involved.
Still, I'll always take a crippled but existent D compiler that targets
Android over a perfect but non-existent D compiler that targets Android.
I think that the D-directly-to-ARM is the current approach for
cross-compiling. I critique it for its underwhelming lack of results.
> Anyway, the devil's in the detail. Originally I wanted to do a parser
> generator and a "completely general AST" in C# and couldn't seem to work
> out the details, but D is more flexible and is likely better suited to
> the task.
I can easily see how this is the case. I don't think I'd be interested
in doing a project like this in any other language. I imagined trying
to do something like this in C or Java or C# and it just doesn't seem
practical. For instance, I don't think the "use regular expressions to
match AST structures" would work well in other cases because it would
either (1) have a bunch of runtime overhead for compiling the
expressions into DFAs at runtime or (2) suffer from integration problems
if you try to compile the expressions in separate files before compiling
the rest of the front-end.
More information about the Digitalmars-d
mailing list