Let's stop parser Hell
Roman D. Boiko
rb at d-coding.com
Sat Jul 7 15:18:54 PDT 2012
On Saturday, 7 July 2012 at 22:03:20 UTC, Chad J wrote:
> enum : SyntaxElement
> AST_EXPRESSION = 0x0001_0000_0000_0000,
> AST_UNARY_EXPR = 0x0000_0001_0000_0000 |
This would cause wasting space (probably a lot). Definitely it
would not be easy to implement in a parser generator, when
various language properties are not known beforehand for
> This approach of course has shameful nesting limitations, but I
> feel like these determinations could be fairly well optimized
> even for the general case. For example: another approach that
> I might be more inclined to take is to give each token/symbol a
> low-valued index into a small inheritance table.
Depending on implementation, that might introduce the multiplier
overhead of table access per each comparison (and there would be
many in case of searching for nodes of specific type).
> I would expect the regex engine to call the isA function as one
> of it's operations. Thus placing an AST_EXPRESSION into your
> expression would also match an AST_NEGATE_EXPR too.
But actually it is not so difficult to implement in a very
similar way to what you described. I was thinking about a lookup
table, but different from a traditional inheritance table. It
would be indexed by AST node type (integral enum value), and
store various classification information as bits. Maybe this is
what you meant and I misunderstood you... Example is here:
(sorry, it doesn't show how to do classification, and has a
different context, but I hope you get the idea). The advantage
over storing hierarchical information directly in each token is
obviously memory usage.
More information about the Digitalmars-d