Reducing Pegged ASTs

Tue Nov 25 22:09:03 PST 2014

On Tue, Nov 25, 2014 at 4:12 PM, "Nordlöw"
<digitalmars-d-learn at puremagic.com> wrote:
> Is there a way to (on the fly) reduce Pegged parse results such as
>
> C [0, 6]["int", "x", ";"]
>  +-C.TranslationUnit [0, 6]["int", "x", ";"]
>     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
>        +-C.Declaration [0, 6]["int", "x", ";"]
>           +-C.DeclarationSpecifiers [0, 4]["int"]
>           |  +-C.TypeSpecifier [0, 4]["int"]
>           +-C.InitDeclaratorList [4, 5]["x"]
>              +-C.InitDeclarator [4, 5]["x"]
>                 +-C.Declarator [4, 5]["x"]
>                    +-C.DirectDeclarator [4, 5]["x"]
>                       +-C.Identifier [4, 5]["x"]
>
> to
>
> C [0, 6]["int", "x", ";"]
>  +-C.TranslationUnit [0, 6]["int", "x", ";"]
>     +-C.ExternalDeclaration [0, 6]["int", "x", ";"]
>        +-C.Declaration [0, 6]["int", "x", ";"]
>           +-C.TypeSpecifier [0, 4]["int"]
>           +-C.Identifier [4, 5]["x"]
>
> and still, when needed, be able query that C.Identifier is an instance of
> C.DirectDeclarator etc?

The reducing can be done, either with operators or semantic actions.
IIRC there is a free function in Pegged that does it.
I did not automate it, because every time I cut down severely a parse
tree, I later regret it because I lost information that way.

Cutting while still retaining original info (who is this node's
ancestor) is more difficult: you would have to store it somewhere
anyhow. You cannot create node classes to represent the hierarchy,
because of loops in the grammar: an Identifier can have many different
ancestors.

Note also that Pegged produces parse trees (complete parsing
information), not ASTs. ASTs would indeed be much smaller, but we
would have to define what are the master nodes in the D grammar.

> If not this seems like a cool feature to have ;)
>
> I guess it would reduce memory requirements by about a magnitude right?

If you want to remember the intermediate nodes you cut down, not
really, since you still need to store them somehow.

I think what's consuming memory right now is that I duplicate the
matched strings at each level, even concatenating them at the higher
levels. I should let them only in the 'leaves' of the tree (heck, like
an AST).

Halas, I've no free time to code anything in D these days... but of
course, feel free to push any pull request you might have!