Labels as values and threaded-code interpretation
    Alex Rønne Petersen 
    alex at lycus.org
       
    Fri May 31 22:29:27 PDT 2013
    
    
  
Hi,
I'm sure this has been brought up before, but I feel I need to bring it 
up again (because I'm going to be writing a threaded-code interpreter): 
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
This is an incredibly important extension. The final switch statement is 
not a replacement because it doesn't allow the programmer to store a 
label address directly into a code stream, which is what's essential to 
write a threaded-code interpreter.
The Erlang folks went through hell just to use this feature; see the 5th 
Q at: 
http://www.erlang.org/doc/installation_guide/INSTALL-WIN32.html#Frequently-Asked-Questions
The idea is to be able to write code like this:
----
import std.algorithm;
enum Op : ubyte
{
     imm,
     add,
     sub,
     // ...
     ret,
}
final class Insn
{
     Op op;
     size_t[] args;
     void* lbl;
     Insn next;
}
final class State
{
     Insn pc;
     size_t[64] regs;
}
size_t interp(Insn[] code)
{
     // Set up the instruction stream with label addresses
     // the first time that it is executed. Label addresses
     // are stable, so we only do this once.
     foreach (insn; code.filter!(x => !x.lbl)())
     {
         void* lbl;
         with (Op)
         {
             final switch (insn.op)
             {
                 case imm: lbl = &&handle_imm; break;
                 case add: lbl = &&handle_add; break;
                 case sub: lbl = &&handle_sub; break;
                 // ...
                 case ret: lbl = &&handle_ret; break;
             }
         }
         insn.lbl = lbl;
     }
     // Start interpreting the entry instruction.
     auto state = new State;
     state.pc = code[0];
     goto *state.pc.lbl;
     // Interpreter logic follows...
     // The whole point is to avoid unnecessary function
     // calls, so we use a mixin template for this stuff.
     enum advance = "state.pc = state.pc.next;" ~
                    "goto *state.pc.lbl;";
     handle_imm:
     {
         state.regs[state.pc.args[0]] = state.pc.args[1];
         mixin(advance);
     }
     handle_add:
     {
         state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] + 
state.regs[state.pc.args[2]];
         mixin(advance);
     }
     handle_sub:
     {
         state.regs[state.pc.args[0]] = state.regs[state.pc.args[1]] - 
state.regs[state.pc.args[2]];
         mixin(advance);
     }
     // ...
     handle_ret:
     {
         return state.regs[state.pc.args[0]];
     }
     assert(false);
}
----
Notice that there are no function calls going on. Just plain jumps all 
the way through. This is a technique that many real world interpreters 
use to get significant speedups, and I for one think we desperately need it.
Implementing it should be trivial as both LLVM and GCC support taking 
the address of a block. I'm sure the DMD back end could be extended to 
support it too.
-- 
Alex Rønne Petersen
alex at alexrp.com / alex at lycus.org
http://alexrp.com / http://lycus.org
    
    
More information about the Digitalmars-d
mailing list