Switch implementation

Iain Buclaw ibuclaw at ubuntu.com
Tue Sep 28 15:46:01 PDT 2010


== Quote from Pelle (pelle.mansson at gmail.com)'s article
> On 09/28/2010 07:33 PM, bearophile wrote:
> > Through Reddit I have found a small article about reverse engineering the
switch statement:
> > http://www.codeproject.com/KB/cpp/switch.aspx
> >
> > I have compiled a test program with GCC and then with DMD with minimal
changes, this is the D program and the asm from the two compilers:
> >
> > ... listing ...
> >
> > gcc and llvm-gcc use a binary search, dmd a linear one.
> >
> > I have done a similar interesting test (similar to switch5.cpp of the article
author) where a good implementation of the switch is a small table for part of the
cases and a binary tree for the other cases.
> >
> > Bye,
> > bearophile
> Interesting. Do you have any performance comparisons? The input is
> fairly small, I'm not sure a binary search will help a lot.

Out of curiousity, I thought I might give a good stress test a try.

Specs of my machine:

Samsung N110, 2.1GHz Intel Atom, 2GB Memory.

Code looks like this: It's a switch statement with 20,000 cases.

import std.c.stdio: puts;
import std.c.stdlib: atoi;

void f1() { puts("f1 called"); }
void f2() { puts("f2 called"); }
void f3() { puts("f3 called"); }

void main() {
   int i = atoi("20000");

   switch (i) {
      case 1: f1(); break;
      case 2: f2(); break;
      case 3: f3(); break;
// Turtles all the down...
      case 19999: f1(); break;
      case 20000: f2(); break;
      default: f3(); break;
   }
}


Compiling with gdc -O3. objdump output shows us the binary search bearophile
mentioned:

lea    0x0(%esi),%esi
jmp    *0x80a9350(,%eax,4)
nop
call   8049430 <_D4test2f2FZv>
lea    0x0(%esi),%esi
jmp    8049491 <_Dmain+0x21>
lea    0x0(%esi),%esi
call   8049410 <_D4test2f3FZv>
lea    0x0(%esi),%esi
jmp    8049491 <_Dmain+0x21>
lea    0x0(%esi),%esi
call   8049430 <_D4test2f2FZv>
...

Time it takes to compile:
real	2m0.061s
user	1m46.567s
sys	0m1.176s

Time it takes to run:
real	0m0.007s
user	0m0.004s
sys	0m0.004s


What is interesting (for me) is that the compile time *could* be quicker if the
semantic for CaseStatement is little smarter at checking for duplicate cases.

The current implementation is (in pseudo code), is a very linear

for (int i = 0; i < num_searched_cases; i++)
    check_is_dupe();

So if we have 20,000 cases in a statement, the compiler will iterate over the loop
like so:
0
0 1
0 1 2
0 1 2 3
...
0 .. 19999
0 .. 20000

Which is slow on any machine...

I would post results for DMD, but that is still compiling 20 minutes in...
Currently writing out function main to object file...

This is certainly a first for me, but this one user case shows that DMD *can* be
woefully slower than GDC at something. :3


More information about the Digitalmars-d mailing list