Which is faster? ulong or double

Thu Sep 27 20:23:44 PDT 2007

Janice Caron wrote:
> I have this app I've written, and it needs to keep track of an integer
> quantity (time in microseconds, as it happens, but that's an
> unimportant detail). The point is, there are circumstances where the
> numbers involved get bigger than uint.max.
> 
> So the question is, given that I'm using a 32-bit platform, should I
> switch to ulong, or to double?
> 
> ulong sounds the most logical, since the quantity will always be an
> integer, but (correct me if I'm wrong) ulongs are emulated in
> software, which is fine for add and subtract, but not so fine for
> divide; whereas doubles have direct hardware support, and so might
> actually end up being faster if there are lots of divides.
> 
> Am I talking nonsense? Is there a recommendation?

The only way to tell is to benchmark it.  Also be aware that different 
CPUs will perform differently due to many factors like prediction and 
being able to run certain double and integer operations at the same 
time.  On some processors it may be faster to interleave doubles and 
uints.  Even then some processors can run more floating point operations 
  per cycle then uints (so the intervening may be like 4 doubles and 2 
uints per cycle).

If you have a fast GPU you can offload this sort of operation to the GPU 
which if you have enough of these values can be like 300 times faster 
then the CPU.

Then theres SIMD, SIMD2, SIMD3 (specifically SSE2) ect.. which can do a 
load of operations at once (ie 4 float divides at the same time) and 
have some 64bit support (doubles, 64 ints) its similar to the GPU but 
less operations.  These I would recommend this over GPU if you want your 
app to work on more systems.  See: 
http://www.hayestechnologies.com/en/techsimd.htm

64 bit machines + OS of course its pretty fast to do these operations in 
64bit.

You could try an app optimisation where anything larger then the 
boundary is stored on a separate list and processed separately (probably 
easy to do with templates).

However the best thing to do is to profile and find out where your 
bottleneck is and if its even worth the trouble applying these 
optimizations.  Algorithmic operations (in general) are much faster then 
branching and other operations which cause memory fetching.