Computed gotos on Reddit

Tue Jul 24 22:04:52 PDT 2012

On 25-Jul-12 03:31, Walter Bright wrote:
> On 7/24/2012 3:33 PM, Dmitry Olshansky wrote:
>> There is no code just jump [ecx]. The offset is included already.
>
> I'm not seeing where "code" is in the asm code you presented.

> Sorry, I have no idea why it is not the same. jumptable is a static
> array, and so does not need to be loaded into a register. And code[]
> needs to be loaded from somewhere, and it looks like it's omitted from
> your example.

Code was assumed to be in ebx obviously. BTW Static array for jump table 
is all good and well
  but does this trick work with PIC code? And last but not least - the 
jump target has to be _read_ from jump table
  and then jumped to it isn't it?

OK I've taken your comments into account.
Now I think I finally got it right:

mov ecx, [ebx] ; ecx = code[pc]
inc ebx ; pc ++
jmp ecx ; goto code[pc], as ecx is already a pointer

vs

mov ecx, [ebx] ; ecx = code[pc]
inc ebx ; ; inc pc
jump jump_table[ecx]; ; switch jump to it

or in english, ommiting PC increment:
1.
read x from array
jump to it
2.
  read x from array
  read jump address from 2nd static array at offset x
  jump to it

So to sum it all up: 1 vs 2 memory accesses, same register use, same 
(macro) instruction count.
Still I think that not having to touch extra static table is bonus and 
jump jump_table[ecx] could be less efficient on some processors then 
plain jump ecx (need to checks this).

>> Now the same thing with switches.
>> And before you ask -
>> NO! Single switch won't do as it would have successful branch
>> prediction rate of
>> as bad as about 2% (see the article!). The more opcode types the worse
>> prediction is.
>
> I see what you mean, but this could be done with final switch by the
> compiler, as I explained to bearophile.
>
Great but I still try to show that they are less efficient, see above.

>> //GLOBALS
>> size_t[] bytecode = firstPassCompile("my_source.ds"); //say it's
>> DMDscript ;)
>>
>> void run(size_t pc)
>> {
>>      //here we don't JIT to real addresses beforehand
>>      //as jump tables somehow should be good enough
>>      // Okay...
>>
>>      //interpret:
>>      switch(bytecode[pc])
>>      {
>>      L_op1:
>>      case 0:
[snip]
>>      L_opN:
>>      case N-1:
>>      ...
>>      //DISPATCH:
>>      //here comes trouble, Nth table ... time to count overhead
>>          switch(bytecode[pc])
>>          {
>>          case 0:    goto L_op1;
>>          case 1: goto L_op2;
>>          ...
>>          }
>>      }//end of giant switch
>> }
>
> I see what you mean here, too. Thanks for the explanation. It never
> occurred to me that one could write code like that, but I see the point,
> and doing jump table merging could be done fairly easily. No new
> language feature is required.

Superb! I actually tried the code above, generating common things with a 
help of string mixins, of course currently it only gets slightly slower.

Should I file an enhancement request?

-- 
Dmitry Olshansky