double vs real
Timon Gehr
timon.gehr at gmx.ch
Fri May 31 15:06:45 PDT 2013
On 05/31/2013 01:28 PM, Shriramana Sharma wrote:
> On Fri, May 31, 2013 at 4:31 PM, Timon Gehr <timon.gehr at gmx.ch> wrote:
>>
>> If double uses xmm registers and real uses the fpu registers (as is standard
>> on x64), then double multiplication has twice the throughput of real
>> multiplication on recent intel microarchitectures.
>
> Hi can you clarify that? I'm interested because I'm running a 64 bit
> system. What does twice the throughput mean? double is faster?
>
Depends. Two useful numbers to classify performance characteristics of
machine instructions are latency and reciprocal throughput.
Modern out-of-order processors are pipelined. I.e. instructions may take
multiple cycles to complete, and multiple instructions may run through
different stages of the pipeline at the same time.
Latency: The time taken from the point the point where all inputs are
available to the point where all outputs are available.
Reciprocal throughput: The minimum delay between the start of two
instructions of the same kind.
Multiplying doubles in an xmm register has latency 5 and reciprocal
throughput 1 (on recent intel microarchitectures). Multiplying 'reals'
in an fpu register has latency 5 and reciprocal throughput 2.
Therefore, doubles allow more instruction level parallelism (ILP).
However, if you have eg. a computation like this one:
b = a*b*c*d;
Then there will not be a difference in runtime, as all instructions
depend on a previous result.
On the other hand, if you reassociate the expression as follows:
b = (a*b)*(c*d);
Then double will be one cycle faster, since the second mult can be
started one cycle earlier, and hence the third one can also start one
cycle earlier.
If you are interested, more information is available here:
http://agner.org/optimize/#manuals
More information about the Digitalmars-d-learn
mailing list