MiniD 1.0 released!
Jarrett Billingsley
kb3ctd2 at yahoo.com
Thu Aug 2 06:14:15 PDT 2007
"Benji Smith" <dlanguage at benjismith.net> wrote in message
news:f8rmam$2a74$1 at digitalmars.com...
> Very nice. If you wouldn't mind (and if you have the time), I'd love to
> read more about the design of the MiniD interpreter. Are you using a
> parser generator for the lexing/parsing? Once you have your AST, do you
> internally generate VM bytecode, or do you walk the AST? What kinds of
> optimizations do you do? Is the VM/Interpreter purely stack-based, or do
> you use simulated registers? How do you implement your dynamic typing?
>
> Have you read any of the documentation of the parrot project? Back when
> they were publishing more regular updates, I enjoyed reading about their
> implementation details. If you're interesting in writing about your
> experiences designing and implementing MiniD, I'd be very interested in
> reading about them.
That's a good idea for a wiki page. I'll put more stuff in the
"Compilation" and "Execution" sections of the spec.
But in short: I wrote my own lexer/parser from scratch, based on the DMD
frontend. It's a recursive descent parser. Once the AST has been
generated, the compiler walks the tree and generates bytecode. The VM is a
pseudo-register design. Rather, it's more like a native CPU stack, where
enough stack space is allocated for a function's call frame, and then all
accesses to the stack are by offsets from the function's base pointer. (If
you're at all familiar with the internal workings of Lua, MiniD works the
same way.) The compiler, though, does use a sort of stack machine to
convert expression ASTs into the RISC-style three-op bytecodes: it pushes
expressions onto the expression stack, and an operation (like add, call,
etc.) corresponds to a pop, which actually writes the code. This is my own
design (though I wouldn't be surprised if there were other compilers that
used a similar method) and it works pretty well :D
Just as an example, here's some MiniD code:
local x = [1, 2, 3, 4, 5];
for(local i = 0; i < #x; i++)
writefln("x[", i, "] = ", x[i]);
And here's something like what it compiles to (did this in my head so it
might not be exact):
Constants:
0: 1
1: 2
2: 3
3: 4
4: 5
5: 0
6: "x["
7: "] = "
8: "writefln"
// local x = [1, 2, 3, 4, 5];
newarr r2, 5
lc r3, c0
lc r4, c1
lc r5, c2
lc r6, c3
lr r7, c4
setarr r2, 0, 5
movl r1, r2
// for(local i = 0; i < #x
lc r2, c5
len r3, r1
cmp r2, r3
jge 6
// writefln("x[", i, "] = ", x[i]);
lc r5, c6
movl r6, r2
lc r7, c7
idx r8, r1, r2
precall r3, g8, 1
call r3, 6, 1
// ; i++)
addeq r2, c0
jmp -11
The opcodes: newarr is "new array", it takes the dest register and the size
of the array (if known). lc is load constant (the constant table is unique
to each function). setarr sets a block of registers all at once to several
elements of the array, and is only used for array constructors. movl moves
locals. len gets the length of something (the # operator). cmp compares
two values, and is always followed immediately by jlt, jle, jgt, jge, je, or
jne; the cmp-j pair is executed as a single instruction. idx indexes, and
takes the destination register, then the thing to index, and then the index
value. precall-call are also executed as a single instruction and are used
for calling a regular function. If we were making a method call, it would
be a method-call pair instead. precall takes the register of where the
function should go, then the source of the function, and a flag (1 or 0) as
to whether an explicit context was passed. "g8" as the source means "the
global whose name is stored in constant 8", which is writefln. The flag is
1, so the interpreter will automatically fill the register after the
function, r4, with the function's environment as the 'this' pointer. (this
is 0 when you use the "with" at the beginning of the argument list to
specify an explicit context.) Then call takes the function register to call
(always the same as the register in precall), the number of parameters + 1,
and the number of return values needed + 1. This is again, very similar to
how Lua does it. Finally addeq is the += operator, and so we're just adding
1 to i, and jmp jumps back to the comparison of the loop. Notice that like
a real processor, the PC is incremented after the instruction is fetched, so
jump offsets are offsets from the instruction _after_ the jump.
Hope that whets your appetite :)
More information about the Digitalmars-d-announce
mailing list