Computed gotos on Reddit

Tue Jul 24 16:31:22 PDT 2012

On 7/24/2012 3:33 PM, Dmitry Olshansky wrote:
> On 25-Jul-12 01:40, Walter Bright wrote:
>> On 7/24/2012 12:50 PM, Dmitry Olshansky wrote:
>>>> are the same (same as in same number of indirections).
>>>>
>>>>      switch (code[pc++])
>>>>
>>>> and
>>>>
>>>>      goto code[pc++]()
>>>>
>>>> are the same, too.
>>>
>>> It's not. Let's get to assembly then. It's an illustration as I'm no
>>> expert and
>>> may have made some illegal shortcuts in this listing.
>>>
>>> goto code[pc++]() is roughly:
>>>
>>> mov ecx, [ebx]
>>> inc ebx
>>> jmp [ecx]
>>
>>     jmp code[ecx]
>>
>
> There is no code just jump [ecx]. The offset is included already.

I'm not seeing where "code" is in the asm code you presented.

>
>>
>>> switch(code[pc++]) is:
>>>
>>> mov ecx, [ebx]
>>> inc ebx
>>> mov ecx, [edx+ecx] // assuming jump table is at edx
>>> jump [ecx]
>>
>>     jmp jumptable[ecx]
>
> Damn, it's not the same. If in the above ecx is pc++, here it it code[pc++] i.e.
> needs to be loaded. The whole point. And yes, I *didn't* object that it's still
> 1 JUMP. I'm more focused on extra work being done.

Sorry, I have no idea why it is not the same. jumptable is a static array, and 
so does not need to be loaded into a register. And code[] needs to be loaded 
from somewhere, and it looks like it's omitted from your example.

>> Please post source code example so I understand what you mean.
>
> OK. It sure gets confusing.
> Here is an article that shows the big picture of to what I want to do:
> http://www.complang.tuwien.ac.at/anton/lvas/sem06w/revucky.pdf
> It contains some advanced techniques that are out of scope of current topic but
> Introduction & Background are short and full of relevant facts.
>
> And for the sample code here it is, casts are ommited for brevity.
>
> First one is "if I had a way to have enforced tail call to function or take
> address of label".
>
> Let's assume labels:
>
> //GLOBALS
> size_t[] bytecode = firstPassCompile("my_source.ds"); //say it's DMDscript ;)
>
> bool jitted = false;
>
> void run(size_t pc)
> {
>      //so here bytecode is normal bytecode if jitted != true
>      //for simplicity sake it starts with some number e.g.:
>      // 0 - op_1
>      // 1 - op_2, etc.
>      if(!jitted)
>      {
>          int i=0;
>          //1:1 map of each label that servers specific opcode
>          static tabulated_ops[] =
>              [ &L_op1, &L_op2, &L_op3, ... ];
>
>          while(notEndOfProgram(bytecode[i]) )
>          {
>              size_t n = bytecode[i];
>              //store real target
>              bytecode[i] = tabulated_ops[n];
>              //advance by some size, skipping operands etc.
>              i += opSize(n);
>          }
>      }
>
>      //interpret:
>      goto bytecode[pc];
>      //<---- never gets here
> L_op1:
>      //do my op1 thing
>      pc += x;
> //DISPATH:
>      goto bytecode[pc]; // load address at pc & jump to it
> L_op2:
>      //do some other thing
>      pc += y;
> //DISPATH:
>
>      goto bytecode[pc]; // load address at pc & jump to it
> L_op3:
>      //my other op, jumps back
>      pc -= ...;
> //DISPATH:
>
>      goto bytecode[pc];  // load address at pc & jump to it
> ...
> L_opN:
>      ...
>      goto bytecode[pc];  // load address at pc & jump to it
> }
>
>
>
> Now the same thing with switches.
> And before you ask -
> NO! Single switch won't do as it would have successful branch prediction rate of
> as bad as about 2% (see the article!). The more opcode types the worse
> prediction is.

I see what you mean, but this could be done with final switch by the compiler, 
as I explained to bearophile.

>
> Anyway here it goes:
>
> //GLOBALS
> size_t[] bytecode = firstPassCompile("my_source.ds"); //say it's DMDscript ;)
>
> void run(size_t pc)
> {
>      //here we don't JIT to real addresses beforehand
>      //as jump tables somehow should be good enough
>      // Okay...
>
>      //interpret:
>      switch(bytecode[pc])
>      {
>      L_op1:
>      case 0:
>          //do my op1 thing
>          pc += x;
>      //DISPATCH:
>      //here comes trouble - 1st of N nearly identical tables??
>          switch(bytecode[pc])
>          {
>          case 0:    goto L_op1;
>          case 1: goto L_op2;
>          ...
>          }
>      L_op2:
>      case 1:
>          //do some other thing
>          pc += y;
>      //DISPATCH:
>      //here comes trouble - 2nd table?
>          switch(bytecode[pc])
>          {
>          case 0:    goto L_op1;
>          case 1: goto L_op2;
>          ...
>          }
>      L_op3:
>      case 2:
>          //my other op, jumps back
>          pc -= ...;
>      //DISPATCH:
>      //here comes trouble - 3rd table?
>          switch(bytecode[pc])
>          {
>          case 0:    goto L_op1;
>          case 1: goto L_op2;
>          ...
>          }
> ...
>      L_opN:
>      case N-1:
>      ...
>      //DISPATCH:
>      //here comes trouble Nth table ... time to count overhead
>          switch(bytecode[pc])
>          {
>          case 0:    goto L_op1;
>          case 1: goto L_op2;
>          ...
>          }
>      }//end of giant switch
> }

I see what you mean here, too. Thanks for the explanation. It never occurred to 
me that one could write code like that, but I see the point, and doing jump 
table merging could be done fairly easily. No new language feature is required.