Daniel Keep wrote: > > Funny thing, turns out SSE is actually *slower* for doing a dot product > than regular old x87 code! You should qualify this - I'm guessing you mean for a single dot product? If so, this is the case in most vector coprocessors, as load/store overhead can easily outweigh the gains in vectorization. --Steve