Computed gotos on Reddit

Tue Jul 24 15:33:19 PDT 2012

On 25-Jul-12 01:40, Walter Bright wrote:
> On 7/24/2012 12:50 PM, Dmitry Olshansky wrote:
>>> are the same (same as in same number of indirections).
>>>
>>>      switch (code[pc++])
>>>
>>> and
>>>
>>>      goto code[pc++]()
>>>
>>> are the same, too.
>>
>> It's not. Let's get to assembly then. It's an illustration as I'm no
>> expert and
>> may have made some illegal shortcuts in this listing.
>>
>> goto code[pc++]() is roughly:
>>
>> mov ecx, [ebx]
>> inc ebx
>> jmp [ecx]
>
>     jmp code[ecx]
>

There is no code just jump [ecx]. The offset is included already.

>
>> switch(code[pc++]) is:
>>
>> mov ecx, [ebx]
>> inc ebx
>> mov ecx, [edx+ecx] // assuming jump table is at edx
>> jump [ecx]
>
>     jmp jumptable[ecx]

Damn, it's not the same. If in the above ecx is pc++, here it it 
code[pc++] i.e. needs to be loaded. The whole point. And yes, I *didn't* 
object that it's still 1 JUMP. I'm more focused on extra work being done.

> Please post source code example so I understand what you mean.

OK. It sure gets confusing.
Here is an article that shows the big picture of to what I want to do: 
http://www.complang.tuwien.ac.at/anton/lvas/sem06w/revucky.pdf
It contains some advanced techniques that are out of scope of current 
topic but Introduction & Background are short and full of relevant facts.

And for the sample code here it is, casts are ommited for brevity.

First one is "if I had a way to have enforced tail call to function or 
take address of label".

Let's assume labels:

//GLOBALS
size_t[] bytecode = firstPassCompile("my_source.ds"); //say it's 
DMDscript ;)

bool jitted = false;

void run(size_t pc)
{
	//so here bytecode is normal bytecode if jitted != true
	//for simplicity sake it starts with some number e.g.:
	// 0 - op_1
	// 1 - op_2, etc.
	if(!jitted)
	{
		int i=0;
		//1:1 map of each label that servers specific opcode
		static tabulated_ops[] =
			[ &L_op1, &L_op2, &L_op3, ... ];

		while(notEndOfProgram(bytecode[i]) )
		{
			size_t n = bytecode[i];
			//store real target
			bytecode[i] = tabulated_ops[n];
			//advance by some size, skipping operands etc.
			i += opSize(n);
		}
	}

	//interpret:
	goto bytecode[pc];
	//<---- never gets here
L_op1:
	//do my op1 thing
	pc += x;
//DISPATH:
	goto bytecode[pc]; // load address at pc & jump to it
L_op2:
	//do some other thing
	pc += y;
//DISPATH:

	goto bytecode[pc]; // load address at pc & jump to it
L_op3:
	//my other op, jumps back
	pc -= ...;
//DISPATH:

	goto bytecode[pc];  // load address at pc & jump to it
...
L_opN:
	...
	goto bytecode[pc];  // load address at pc & jump to it
}

Now the same thing with switches.
And before you ask -
NO! Single switch won't do as it would have successful branch prediction 
rate of as bad as about 2% (see the article!). The more opcode types the 
worse prediction is.

Anyway here it goes:

//GLOBALS
size_t[] bytecode = firstPassCompile("my_source.ds"); //say it's 
DMDscript ;)

void run(size_t pc)
{
	//here we don't JIT to real addresses beforehand
	//as jump tables somehow should be good enough
	// Okay...

	//interpret:
	switch(bytecode[pc])
	{
	L_op1:
	case 0:
		//do my op1 thing
		pc += x;
	//DISPATCH:
	//here comes trouble - 1st of N nearly identical tables??
		switch(bytecode[pc])
		{
		case 0:	goto L_op1;
		case 1: goto L_op2;
		...
		}
	L_op2:
	case 1:
		//do some other thing
		pc += y;
	//DISPATCH:
	//here comes trouble - 2nd table?
		switch(bytecode[pc])
		{
		case 0:	goto L_op1;
		case 1: goto L_op2;
		...
		}
	L_op3:
	case 2:
		//my other op, jumps back
		pc -= ...;
	//DISPATCH:
	//here comes trouble - 3rd table?
		switch(bytecode[pc])
		{
		case 0:	goto L_op1;
		case 1: goto L_op2;
		...
		}
...
	L_opN:
	case N-1:
	...
	//DISPATCH:
	//here comes trouble Nth table ... time to count overhead
		switch(bytecode[pc])
		{
		case 0:	goto L_op1;
		case 1: goto L_op2;
		...
		}
	}//end of giant switch
}

-- 
Dmitry Olshansky