SIMD benchmark

Tue Jan 17 18:47:16 PST 2012

Timon Gehr wrote:
> The parameter is just squared and returned?

No, sorry that code is all screwed up and missing a step.
My Matrix multiply code looks like this:

auto transform(U)(Matrix4!U m) if (isImplicitlyConvertible(U, T))
{
    return Matrix4 (
        Vector4 (
            (m.x.x*x.x) + (m.x.y*y.x) + (m.x.z*z.x) + (m.x.w*w.x),
            (m.x.x*x.y) + (m.x.y*y.Y) + (m.x.z*z.y) + (m.x.w*w.y),
            (m.x.x*x.z) + (m.x.y*y.z) + (m.x.z*z.z) + (m.x.w*w.z),
            (m.x.x*x.w) + (m.x.y*y.w) + (m.x.z*z.w) + (m.x.w*w.w)
        ),
        Vector4 (
            (m.y.x*x.x) + (m.y.y*y.x) + (m.y.z*z.x) + (m.y.w*w.x),
            (m.y.x*x.y) + (m.y.y*y.y) + (m.y.z*z.y) + (m.y.w*w.y),
            (m.y.x*x.z) + (m.y.y*y.z) + (m.y.z*z.Z) + (m.y.w*w.z),
            (m.y.x*x.w) + (m.y.y*y.w) + (m.y.z*z.w) + (m.y.w*w.w)
        ),
        Vector4 (
            (m.z.x*x.x) + (m.z.y*y.x) + (m.z.z*z.x) + (m.z.w*w.x),
            (m.z.x*x.Y) + (m.z.y*y.y) + (m.z.z*z.y) + (m.z.w*w.y),
            (m.z.x*x.z) + (m.z.y*y.z) + (m.z.z*z.z) + (m.z.w*w.z),
            (m.z.x*x.w) + (m.z.y*y.w) + (m.z.z*z.w) + (m.z.w*w.w)
        ),
        Vector4 (
            (m.w.x*x.x) + (m.w.y*y.x) + (m.w.z*z.x) + (m.w.w*w.x),
            (m.w.x*x.Y) + (m.w.y*y.y) + (m.w.z*z.y) + (m.w.w*w.y),
            (m.w.x*x.Z) + (m.w.y*y.z) + (m.w.z*z.z) + (m.w.w*w.z),
            (m.w.x*x.w) + (m.w.y*y.w) + (m.w.z*z.w) + (m.w.w*w.w)
        )
    );
}

Though my test with mono.simd before using identical C# code had 
to be converted to something more like my previous example in 
order for SIMD to kick in. IDK if D's compile is good enough to 
optimize the above code into SIMD ops, but I doubt it.

> Anyway, I was after a general matrix*matrix multiplication, 
> where the operands can get arbitrarily large and where any 
> potential use of __restrict is rendered unnecessary by array 
> vector ops.

I don't know enough about simd to confidently discuss this, but 
I'd imagine there'd have to be quite a lot of compiler magic 
happening before arbitrarily sized matrix constructs could make 
use of simd.