DConf 2013 Day 1 Talk 3: Distributed Caching Compiler for D by Robert Schadek

Dmitry Olshansky dmitry.olsh at gmail.com
Tue May 14 06:43:46 PDT 2013


13-May-2013 16:01, Andrei Alexandrescu пишет:
> Watch, discuss, vote up!
>
> http://reddit.com/r/programming/comments/1e8mwq/dconf_2013_day_1_talk_3_distributed_caching/
>
>

Nice one! Reminds me the gory days of our "compiler technologies" course :)

Our professor seemed obsessed with tabulated methods as well (Dragon 
book etc.) And above all - reverse polish notation as the one and holy 
AST representation. Don't you try to say recursive descent parser in the 
class...

Speaking of flat AST point in the talk  - reverse-polish notation is 
kind of neat in that it does represent flat AST and is easy to walk.
So one need not to have 2-array AST (childs + map to where are these).

Another advantage - you can slice off a sub-tree as simple as a chunk of 
an array! It should be good for CPU caches (all related stuff is in one 
block, no pointer chasing), but I never was close enough to a real 
compiler to run any meaningful benchmarks.

Let's see the idea of AST as polish-notation array applied to the tree 
in the talk. I show the forward one as it easier to read and less tricky 
to walk.

Using the following notation for each node:
<arity>, array of offsets * arity, type code+contents |  subtree(s)

Data is laid out byte-by-byte in raw-binary.
Offsets are in bytes starting right after type+contents.

So e.g. leaf node of int would be:
0, int
No offsets nor sub-trees, content/type tag etc. is marked as int

Then whole tree (tried to align for reading):

1, 0, S | 1, 0, DeclDefs |
   3, 0, <x>, <y>, Declarator |
     1, 0, BasicType | 0, int
     1, 0, Identifier | 0, main
     1, 0, DeclDefs | 1, 0, DeclDef | 1, 0 ReturnStatement | 0, "1"


<x> and <y> are sizes of sub-trees and can easily be calculated at 
construction. Insertion is simple array insertion + fix ups of offsets 
up the tree (you have to know your path in nodes anyway to back up).

Giving up on random access to child nodes would allow us to drop offset 
table completely. I have limited experience on semantic analysis to tell 
if it makes sense to require or drop it.


Will comment on the DFA & Unicode some time later, it's a neat topic in 
its own right.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-announce mailing list