D vs VM-based platforms

Tue May 1 20:03:54 PDT 2007

Benji Smith wrote:
> Daniel Keep wrote:
>> Sorry; yes, you're right: it's for a single dot product.
>>
>> I'm surprised at this because of the sheer number of articles I ran
>> across touting "faster" dot product functions using SSE.  I have a
>> feeling these people have never bothered to actually *benchmark* their
>> "faster" functions :P
>>
>>     -- Daniel
> 
> 
> I'm also assuming that's for some low-dimensionality vector? I'd
> likewise guess that there's some sweet spot where dot product
> calculation is faster with SSE, even for a single pair of vectors, if
> the vectors are of sufficient dimensionality.
> 
> --benji

3D single-precision.  The problem seems to be a combination of unaligned
loads, and the trickery you have to resort to in order to sum the XMM
register horizontally.  There's a dot product instruction in SSE4, but I
don't have a CPU that supports it.  :P

It also doesn't help that the compiler will inline the FPU functions,
but won't inline the SSE ones.

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/