Explicitly unimplemented computed gotos

Thu Nov 25 22:14:02 PST 2010

bearophile wrote:

> 1) Invent a syntax to represent and use them (probably the GCC syntax is good,
because it's already known in C).

I'd argue that we have a syntax reserved for them already:

switch(x) {
  case 0:
  case 1:
  case 2:
  [......]
}

And this is just a little optimization in the generated code.
(Specifically: if a switch has integer cases that fill in a
whole range from [0..n), it could be rewritten as pre-caching
the addresses in an array of length n and the dispatch simply
be written as a jump to the entry in that array.)

The beauty of using switch is it continues to work even in
compilers that don't implement the optimization, without
having to write the code twice.

> The difference is that D may never implement computed gotos.

I'm actually pretty amazed that we can't really do them right
now with the inline assembler.

Well... actually, we can, but it isn't the most beautiful
implementation.

Check this out:

=======
void main() {
	void* val;

lbl1:
	asm {
		call near ptr $+2; // call this location + 2 bytes - that is, the address of the
pop instruction
		jmp short lbl2; // FIXME: if the jmp isn't actually short, the whole thing blows
up! This feels like an assembler bug; it should probably bitch that I'm asking for
the impossible instead of just ignoring the short
		pop ESI; // it now holds the address pushed by call
		mov EAX, ESI; // load up the address for some work...
		add AL, [ESI - 1]; // ESI is address of the jmp opcode. +1 is the offset of the
jump... thus adding it gets the absolute address of the label. FIXME: what if al
overflows?
		mov [val], EAX; // store it

		jmp [val]; // and let's go ahead and make the computed goto. The program should
print "34"
	}

	printf("2");

lbl2:

	printf("3");

lbl3:

	printf("4");
}
=========

(I used printf to make the object dump a bit shorter than with writefln)

If you change to lbl3 it correctly prints 4. My voodoo worked!

If that were wrapped up into a string mixin and fixed those FIXME's - not terribly
hard in this specific case, but might be hard to generalize - we'd have a  general
way to getLabelAddress and jumpToPointer (the latter being trivial - that one jmp
[val]; instruction) implemented right in the library.

It won't be as efficient as a compiler generated jump table, but initialization is
unlikely to be a big deal here anyway. (What this code does is take the compiler's
generated code and reads it back at runtime! Obviously skipping that read back at
runtime step would be a lil faster, but since it is a one time cost it can
probably be ignored.)

I've started the mixin solution but it isn't quite right yet. I'm thinking I have
an off by one error when populating the array. Been a while since I've done
machine code reading and manipulation like this and it is getting late.

Nevertheless, I'm pretty convinced that the language offers everything we need to
implement this in the library right now, albeit the initialization won't be 100%
perfect. If I can fix this minor error in my current mixin loop, we can get very
close. Of course, reading the program's own binary code isn't the most elegant
implementation, but the usage is simple enough:

         // first is the name of the array to create, then the names of the labels
with which to populate it
        mixin(getLabelAddresses!("labels", "lbl1", "lbl2", "lbl3", "lbl4"));
        mixin(gotoPointer!("labels", 3)); // array name and index to jump to

Anyway I spent way too much time on this. Gotta get to bed. But my feeling on the
proposal again:

1) We have a syntax that works this way: switch(). The optimizer could, in theory,
recognize this usage and create it.

2) We have an inline assembler that can surely do the job. Let's use it and see
what we can do.