Fascinating new switch mechanism in assembler

Mon Mar 20 02:33:16 PST 2006

> ?test@@YAHH at Z:
>  push EBX
>  mov EBX,8[ESP]
>  sub EBX,1
>  cmp EBX,6
>  ja L1A
>  jmp dword ptr FLAT:_DATA[00h][EBX*4]
>  mov EAX,8[ESP]
>  pop EBX
>  ret
> L1A:  mov EAX,8[ESP]
>  inc EAX
>  pop EBX
>  ret
> _TEXT ends
> _DATA segment
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> dd offset FLAT:?test@@YAHH at Z[014h]
> _DATA ends

In fact, it's even better. I got this code (dmd -O -release -inline):

00402010 push ebx
00402011 mov ebx,eax
00402013 sub ebx,1
00402016 mov ecx,eax
00402018 cmp ebx,6
0040201B ja 00402026
0040201D jmp dword ptr [ebx*4+411080h]
00402024 pop ebx
00402025 ret
00402026 pop ebx
00402027 lea eax,[ecx+1]
0040202A ret

Although the point is clear (the compiler already uses jump tables) the code 
does not seem optimal. Of course, my assembly knowledge is kind-of rusty 
(instruction pairing for the pentium 1 is the latest optimization I know 
of).

First of all the jmp is useless (in fact, the jump table is useless). The 
code already compares the "default:" case, so there's no need to further 
differentiate between the actual values. I would have written the switch as 
follows:

  push ebx
  mov ebx,eax
  dec ebx
  mov ecx,eax
  cmp ebx,6
  jbe L1A
  lea eax,[ecx+1]
L1A:
  pop ebx
  ret

What's the reason for the "sub ebx, 1", intead of "dec ebx"? Isn't an 
instruction using an immediate value slower (larger instruction => less 
instructions in cache) than one without? (At least that's what I remember 
from the pentium 1).

Would the usage of the instructions setbe/seta improve the above code even 
more? It'll get rid of the conditional jump, but I have no idea what the 
performance of those set* instructions are.

L.