3D Math Data structures/SIMD
Jascha Wetzel
firstname at mainia.de
Sat Dec 22 03:41:39 PST 2007
Sascha Katzner wrote:
> Lukas Pinkowski wrote:
>> SIMD instructions are pretty 'old' now, but the compilers support them
>> only
>> through non-portable extensions, or handwritten assembly.
>
> One reason could be that, it is a performance penalty for the OS to save
> the SIMD registers (XMM1, XMM2...etc.). You can verify that with the
> test programm attached to this posting. If you uncomment lines 27-30 the
> programm is ~50% slower (9.8s vs 6.7s on a Core 2 Duo E6750 on Vista).
>
> SSE is great if you do a lot of heavy computations in your program, but
> if you only do a dot product here and a cross product there you better
> not use SSE, because your whole program runs a lot slower if you use SSE
> instructions.
interesting! since SSE is an integral part of x86-64, i wonder whether
this is an issue there as well...
Using the slightly modified code below, i tried that using GDC on 64bit
linux and the timing was identical. That doesn't mean too much, but it
is a hint. Further testing pending...
import tango.io.Stdout;
import tango.util.time.StopWatch;
struct Vector3f {
float x, y, z;
void opAddAssign(ref Vector3f v) {
x += v.x;
y += v.y;
z += v.z;
}
Vector3f opMul(float s) {
return Vector3f(x * s, y * s, z * s);
}
}
int main(char[][] args) {
StopWatch elapsed;
Vector3f v1 = {1.0f, 2.0f, 3.0f};
Vector3f v2 = {4.0f, 5.0f, 6.0f};
float t;
asm {
movss XMM1, t;
}
elapsed.start;
for (int i=0; i<0x40FFFFFF; i++) {
// do something nontrivial...
v2 += v1 * 3.0f;
}
auto duration = elapsed.stop;
Stdout.formatln("{:6}", duration);
// to ensure that the compiler doesn't eliminate/optimize the inner loop
Stdout("(" v1.x, v1.y, v1.z ") (" v2.x, v2.y, v2.z ")").newline;
return 0;
}
More information about the Digitalmars-d
mailing list