Trivial benchmarking on linux

Mon Mar 9 11:28:06 PDT 2009

Inspired by recent benchmarking posts, I decided to do a little, too.

I decided to compare looping up and looping down.

void main(char[][] args)
{
     auto count = to!(long)(args[1]);
     for(long i = 0; i < count; i++)  { /* do nothing */ }
}

I wanted to know if it makes a difference if the loop counts backwards, 
so the other program had the following line instead:

     for(long i = count; i > 0; --i)

So I compiled:

$ dmd loop.d
$ dmd loopv.d

Here I stored the program name for later use before running the 
benchmark (real handy because you probably end up with several versions 
of your program):

$ p=loop
$ rm -f $p.bench;for a in {1..30} ; do
 >   (time $p 100000000) 2>> $p.bench ; done

To test the other program, I changed p to the other program's name, and 
then I simply pressed up-arrow so I got back the long command doing the 
benchmarking.

To see the best result of the 30 test runs, I wrote:

$ grep real loop.bench | sort | head -1
real	0m0.337s
$ grep real loopv.bench | sort | head -1
real	0m0.316s

As I expected, counting backwards was faster, but not as much as I 
expected. I also did the same benchmark but with the for-loop counting 
10x longer, and got similar results.

Then I got curious as to what the difference between these programs 
really was, and decided to take a look:

$ objdump -d loop.o > loop.asm
$ objdump -d loopv.o > loopv.asm
$ diff loop.asm loopv.asm

35,54c35,48
<   29:	89 45 f0             	mov    %eax,-0x10(%ebp)
<   2c:	89 55 f4             	mov    %edx,-0xc(%ebp)
<   2f:	c7 45 f8 00 00 00 00 	movl   $0x0,-0x8(%ebp)
<   36:	c7 45 fc 00 00 00 00 	movl   $0x0,-0x4(%ebp)
<   3d:	8b 55 fc             	mov    -0x4(%ebp),%edx
<   40:	8b 45 f8             	mov    -0x8(%ebp),%eax
<   43:	3b 55 f4             	cmp    -0xc(%ebp),%edx
<   46:	7f 11                	jg     59 <_Dmain+0x59>
<   48:	7c 05                	jl     4f <_Dmain+0x4f>
<   4a:	3b 45 f0             	cmp    -0x10(%ebp),%eax
<   4d:	73 0a                	jae    59 <_Dmain+0x59>
<   4f:	83 45 f8 01          	addl   $0x1,-0x8(%ebp)
<   53:	83 55 fc 00          	adcl   $0x0,-0x4(%ebp)
<   57:	eb e4                	jmp    3d <_Dmain+0x3d>
<   59:	31 c0                	xor    %eax,%eax
<   5b:	c9                   	leave
<   5c:	c3                   	ret
<   5d:	90                   	nop
<   5e:	90                   	nop
<   5f:	90                   	nop
---
 >   29:	89 45 f8             	mov    %eax,-0x8(%ebp)
 >   2c:	89 55 fc             	mov    %edx,-0x4(%ebp)
 >   2f:	83 7d fc 00          	cmpl   $0x0,-0x4(%ebp)
 >   33:	7c 12                	jl     47 <_Dmain+0x47>
 >   35:	7f 06                	jg     3d <_Dmain+0x3d>
 >   37:	83 7d f8 00          	cmpl   $0x0,-0x8(%ebp)
 >   3b:	76 0a                	jbe    47 <_Dmain+0x47>
 >   3d:	83 6d f8 01          	subl   $0x1,-0x8(%ebp)
 >   41:	83 5d fc 00          	sbbl   $0x0,-0x4(%ebp)
 >   45:	eb e8                	jmp    2f <_Dmain+0x2f>
 >   47:	31 c0                	xor    %eax,%eax
 >   49:	c9                   	leave
 >   4a:	c3                   	ret
 >   4b:	90                   	nop

One sees that they are quite different. (I bet Walter has some 
interesting commentary on this.)

Concluding remarks:

It is almost trivial to do such benchmarking on linux. Also, changing a 
program in just one place keeps the diff output small enough so that one 
can easily see what actually changes in the compiled program.

The program `objdump' is standard on linux. (One can also use the D 
utilities, but I happened to use objdump here.)