Labels as values and threaded-code interpretation
Alex Rønne Petersen
alex at lycus.org
Fri May 31 22:29:27 PDT 2013
Hi,
I'm sure this has been brought up before, but I feel I need to bring it
up again (because I'm going to be writing a threaded-code interpreter):
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
This is an incredibly important extension. The final switch statement is
not a replacement because it doesn't allow the programmer to store a
label address directly into a code stream, which is what's essential to
write a threaded-code interpreter.
The Erlang folks went through hell just to use this feature; see the 5th
Q at:
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions
The idea is to be able to write code like this:
----
import std.algorithm;
enum Op : ubyte
{
imm,
add,
sub,
// ...
ret,
}
final class Insn
{
Op op;
size_t[] args;
void* lbl;
Insn next;
}
final class State
{
Insn pc;
size_t[64] regs;
}
size_t interp(Insn[] code)
{
// Set up the instruction stream with label addresses
// the first time that it is executed. Label addresses
// are stable, so we only do this once.
foreach (insn; code.filter!(x => !x.lbl)())
{
void* lbl;
with (Op)
{
final switch (insn.op)
{
case imm: lbl = &&handle_imm; break;
case add: lbl = &&handle_add; break;
case sub: lbl = &&handle_sub; break;
// ...
case ret: lbl = &&handle_ret; break;
}
}
insn.lbl = lbl;
}
// Start interpreting the entry instruction.
auto state = new State;
state.pc = code[0];
goto *state.pc.lbl;
// Interpreter logic follows...
// The whole point is to avoid unnecessary function
// calls, so we use a mixin template for this stuff.
enum advance = "state.pc = state.pc.next;" ~
"goto *state.pc.lbl;";
handle_imm:
{
state.regs[state.pc.args[0]] = state.pc.args[1];
mixin(advance);
}
handle_add:
{
state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] +
state.regs[state.pc.args[2]];
mixin(advance);
}
handle_sub:
{
state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] -
state.regs[state.pc.args[2]];
mixin(advance);
}
// ...
handle_ret:
{
return state.regs[state.pc.args[0]];
}
assert(false);
}
----
Notice that there are no function calls going on. Just plain jumps all
the way through. This is a technique that many real world interpreters
use to get significant speedups, and I for one think we desperately need it.
Implementing it should be trivial as both LLVM and GCC support taking
the address of a block. I'm sure the DMD back end could be extended to
support it too.
--
Alex Rønne Petersen
alex at alexrp.com / alex at lycus.org
http://alexrp.com / http://lycus.org
More information about the Digitalmars-d
mailing list