Let's stop parser Hell

Sat Jul 7 15:18:54 PDT 2012

On Saturday, 7 July 2012 at 22:03:20 UTC, Chad J wrote:
> enum : SyntaxElement
> {
>   AST_EXPRESSION          = 0x0001_0000_0000_0000,
>     AST_UNARY_EXPR        = 0x0000_0001_0000_0000 |

This would cause wasting space (probably a lot). Definitely it 
would not be easy to implement in a parser generator, when 
various language properties are not known beforehand for 
fine-grained tuning.

> This approach of course has shameful nesting limitations, but I 
> feel like these determinations could be fairly well optimized 
> even for the general case.  For example: another approach that 
> I might be more inclined to take is to give each token/symbol a 
> low-valued index into a small inheritance table.

Depending on implementation, that might introduce the multiplier 
overhead of table access per each comparison (and there would be 
many in case of searching for nodes of specific type).

> I would expect the regex engine to call the isA function as one 
> of it's operations.  Thus placing an AST_EXPRESSION into your 
> expression would also match an AST_NEGATE_EXPR too.

But actually it is not so difficult to implement in a very 
similar way to what you described. I was thinking about a lookup 
table, but different from a traditional inheritance table. It 
would be indexed by AST node type (integral enum value), and 
store various classification information as bits. Maybe this is 
what you meant and I misunderstood you... Example is here: 
https://github.com/roman-d-boiko/dct/blob/May2012drafts/fe/core.d 
(sorry, it doesn't show how to do classification, and has a 
different context, but I hope you get the idea). The advantage 
over storing hierarchical information directly in each token is 
obviously memory usage.