Switch implementation
Iain Buclaw
ibuclaw at ubuntu.com
Tue Sep 28 15:46:01 PDT 2010
== Quote from Pelle (pelle.mansson at gmail.com)'s article
> On 09/28/2010 07:33 PM, bearophile wrote:
> > Through Reddit I have found a small article about reverse engineering the
switch statement:
> > http://www.codeproject.com/KB/cpp/switch.aspx
> >
> > I have compiled a test program with GCC and then with DMD with minimal
changes, this is the D program and the asm from the two compilers:
> >
> > ... listing ...
> >
> > gcc and llvm-gcc use a binary search, dmd a linear one.
> >
> > I have done a similar interesting test (similar to switch5.cpp of the article
author) where a good implementation of the switch is a small table for part of the
cases and a binary tree for the other cases.
> >
> > Bye,
> > bearophile
> Interesting. Do you have any performance comparisons? The input is
> fairly small, I'm not sure a binary search will help a lot.
Out of curiousity, I thought I might give a good stress test a try.
Specs of my machine:
Samsung N110, 2.1GHz Intel Atom, 2GB Memory.
Code looks like this: It's a switch statement with 20,000 cases.
import std.c.stdio: puts;
import std.c.stdlib: atoi;
void f1() { puts("f1 called"); }
void f2() { puts("f2 called"); }
void f3() { puts("f3 called"); }
void main() {
int i = atoi("20000");
switch (i) {
case 1: f1(); break;
case 2: f2(); break;
case 3: f3(); break;
// Turtles all the down...
case 19999: f1(); break;
case 20000: f2(); break;
default: f3(); break;
}
}
Compiling with gdc -O3. objdump output shows us the binary search bearophile
mentioned:
lea 0x0(%esi),%esi
jmp *0x80a9350(,%eax,4)
nop
call 8049430 <_D4test2f2FZv>
lea 0x0(%esi),%esi
jmp 8049491 <_Dmain+0x21>
lea 0x0(%esi),%esi
call 8049410 <_D4test2f3FZv>
lea 0x0(%esi),%esi
jmp 8049491 <_Dmain+0x21>
lea 0x0(%esi),%esi
call 8049430 <_D4test2f2FZv>
...
Time it takes to compile:
real 2m0.061s
user 1m46.567s
sys 0m1.176s
Time it takes to run:
real 0m0.007s
user 0m0.004s
sys 0m0.004s
What is interesting (for me) is that the compile time *could* be quicker if the
semantic for CaseStatement is little smarter at checking for duplicate cases.
The current implementation is (in pseudo code), is a very linear
for (int i = 0; i < num_searched_cases; i++)
check_is_dupe();
So if we have 20,000 cases in a statement, the compiler will iterate over the loop
like so:
0
0 1
0 1 2
0 1 2 3
...
0 .. 19999
0 .. 20000
Which is slow on any machine...
I would post results for DMD, but that is still compiling 20 minutes in...
Currently writing out function main to object file...
This is certainly a first for me, but this one user case shows that DMD *can* be
woefully slower than GDC at something. :3
More information about the Digitalmars-d
mailing list