3D Math Data structures/SIMD

Sat Dec 22 03:41:39 PST 2007

Sascha Katzner wrote:
> Lukas Pinkowski wrote:
>> SIMD instructions are pretty 'old' now, but the compilers support them 
>> only
>> through non-portable extensions, or handwritten assembly.
> 
> One reason could be that, it is a performance penalty for the OS to save 
> the SIMD registers (XMM1, XMM2...etc.). You can verify that with the 
> test programm attached to this posting. If you uncomment lines 27-30 the 
> programm is ~50% slower (9.8s vs 6.7s on a Core 2 Duo E6750 on Vista).
> 
> SSE is great if you do a lot of heavy computations in your program, but 
> if you only do a dot product here and a cross product there you better 
> not use SSE, because your whole program runs a lot slower if you use SSE 
> instructions.

interesting! since SSE is an integral part of x86-64, i wonder whether 
this is an issue there as well...
Using the slightly modified code below, i tried that using GDC on 64bit 
linux and the timing was identical. That doesn't mean too much, but it 
is a hint. Further testing pending...

import tango.io.Stdout;
import tango.util.time.StopWatch;

struct Vector3f {
	float x, y, z;

	void opAddAssign(ref Vector3f v) {
		x += v.x;
		y += v.y;
		z += v.z;
	}

	Vector3f opMul(float s) {
		return Vector3f(x * s, y * s, z * s);
	}
}

int main(char[][] args) {
	StopWatch elapsed;

	Vector3f v1 = {1.0f, 2.0f, 3.0f};
	Vector3f v2 = {4.0f, 5.0f, 6.0f};

	float t;
	asm {
		movss	XMM1, t;
	}

     elapsed.start;
	for (int i=0; i<0x40FFFFFF; i++) {
		// do something nontrivial...
		v2 += v1 * 3.0f;
	}
     auto duration = elapsed.stop;
	Stdout.formatln("{:6}", duration);

	// to ensure that the compiler doesn't eliminate/optimize the inner loop
	Stdout("(" v1.x, v1.y, v1.z ") (" v2.x, v2.y, v2.z ")").newline;
	return 0;
}