Compiler optimizations (I'm baffled)

Thu May 4 05:10:00 PDT 2006

It looks to me that your double division is actually encoded as float division
ie 32 bits. I am not too familiar with this syntax but i believe double
precision would look like this:

fldd	LC0
fdivd	-12(%ebp)
fstpd	-8(%ebp)

I am not surprised that the FPU version is faster than the int one. You will
probably find that the Intel version is not the same as the AMD one either.

In article <e3co5o$jth$1 at digitaldaemon.com>, Bruno Medeiros says...
>
>Thomas Kuehne wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Bruno Medeiros schrieb am 2006-05-03:
>>> Walter Bright wrote:
>>>> Craig Black wrote:
>>>>>  This is
>>>>> because integer division is essentially floating point division under the
>>>>> hood.
>>> I ran these tests and I got basicly the same results (the int division 
>>> is slower). I am very intrigued and confused. Can you (or someone else) 
>>> explain briefly why this is so?
>>> One would think it would be the other way around (float being slower) or 
>>> at least the same speed.
>> 
>> 
>> The code doesn't necessarily show that int division is slower than float
>> multiplication.
>> 
>> What CPU are we talking about?
>> 
>> A naive interpretation of the "benchmark" assumes a single execution
>> pipe that does floating point and integer operations in sequence ...
>> 
>> Even assuming a single pipe: Why is the SSE version faster?
>> 
>> Does the benchmark measure the speed of int division against float
>> multiplication? 
>> 
>> Does the benchmark measure the throughput of int division against float
>> multiplication? 
>> 
>> Does the benchmark measure the throughput of int division of a set of
>> numbers through a constant factor against float multiplication of the
>> same set through (1 / constant factor)?
>> 
>> Thomas
>> 
>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> 
>> iD8DBQFEWRDO3w+/yD4P9tIRAs8lAJ9q62J8zf8U0HWzxtxQmMWasuU4ngCgwA21
>> 4M5nb9Z8ZXHevJiwylY/wGM=
>> =QSyS
>> -----END PGP SIGNATURE-----
>
>Hum, yes I should have been more specific. I only ran (a modified 
>version of) the latest test, which measured the throughput of int 
>division against double division (I hope...).
>Let me just put the code:
>
>#include <stdio.h>
>#include <time.h>
>
>//typedef double divtype;
>typedef int divtype;
>
>int main()
>{
>    clock_t start = clock();
>
>
>    divtype result = 0;
>    divtype div=1;
>
>    for(int max = 100000000; div < max; div++)
>    {
>      result = (42 / div);
>    }
>
>
>    clock_t finish = clock();
>    double duration = (double)(finish - start) / CLOCKS_PER_SEC;
>    printf("[%f] %2.2f seconds\n", double(result),duration);
>}
>
>------------------------------------
>I ran the tests with GCC, with both -O0 and -O2, on an Athlon XP, and it 
>both cases the typedef double divtype version was about twice as fast. 
>The assembly code I get for line 17 is the following:
>
>*** INT:
>
>.stabn 68,0,17,LM6-_main
>LM6:
>	movl	$42, %edx
>	movl	%edx, %eax
>	sarl	$31, %edx
>	idivl	-12(%ebp)
>	movl	%eax, -8(%ebp)
>
>*** DOUBLE:
>
>.stabn 68,0,17,LM6-_main
>LM6:
>	flds	LC0
>	fdivs	-12(%ebp)
>	fstps	-8(%ebp)
>
>
>I have little idea what it is that it's doing.
>
>-- 
>Bruno Medeiros - CS/E student
>http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D