Optimize my code =)

Mon Feb 17 14:43:55 PST 2014

Robin:

> The key to victory was pointer arithmetics as I notices that I 
> have used them in the C++ implementation, too. xD

Perhaps with LDC2 it's not necessary.

> I will just post the whole code again so that you can see what 
> I have changed.

The code looks better.

There's no need to put Dimension in another module. In D modules
contain related stuff, unlike Java.

Also feel free to use some free-standing functions. With UFCS
they get used in the same way, and they help making
classes/structs simpler.

Some of your imports could be moved to more local scopes, instead
of being all at module-level.

>                result ~= to!string(this[r, c]);

=>

                  result ~= this[r, c].text;

> writeln("\tTime required: " ~ to!string(secs) ~ " secs, " ~ 
> to!string(msecs) ~ " msecs");

=>

writeln("\tTime required: ", secs, " secs, ", msecs, " msecs");

>    ref Matrix opOpAssign(string s)(in T scalar) pure nothrow if 
> (s == "*") {

=>

      ref Matrix opOpAssign(string op)(in T scalar) pure nothrow if
(op == "*") {

Also you have two functions with code like:

this.data[] op= scalar;

You can define a single template (untested):

      /**
       * Adds or subtracts all entries of this matrix by the given
other matrix.
       */
      ref Matrix opOpAssign(string op)(const ref Matrix other) pure
nothrow
      if (op == "+" || op == "-")
      in
      {
          assert(this.dim == other.dim);
      }
      body
      {
          mixin("this.data[] " ~ op ~ "= other.data[];");
          return this;
      }

>    Matrix opBinary(string s)(const ref Matrix other) const pure 
> if (s == "*")

Given that on a 32 bit system a Matrix is just 16 bytes, it could
be better to not accept the argument by ref, and avoid one more
level of indirection:

      Matrix opBinary(string s)(in Matrix other) const pure if (s
== "*")

> However, there were some strange things of which I am very 
> confused ...
>
> void allocationTest() {
> 	writeln("allocationTest ...");
> 	sw.start();
> 	auto m1 = Matrix!double(10000, 10000);
> 	{ auto m2 = Matrix!double(10000, 10000); }
> 	{ auto m2 = Matrix!double(10000, 10000); }
> 	{ auto m2 = Matrix!double(10000, 10000); }
> 	//{ auto m2 = Matrix!double(10000, 10000); }
> 	sw.stop();
> 	printBenchmarks();
> }
>
> This is the most confusing code snippet. I have just changed 
> the whole allocation for all m1 and m2 from new Matrix!double 
> (on heap) to Matrix!double (on stack)

The matrix data is always on the heap. What ends on the stack is
a very limited amount of stuff.

> This is extremely confusion as I allocate these matrices on the 
> stack and since I have allocated them within their own 
> scoped-block they should instantly release their memory

You are mistaken. minimallyInitializedArray allocates memory on
the GC heap (and there's not enough space on the stack to
allocate 10000^2 doubles). In both D and Java the deallocation of
memory managed by the GC is not deterministic, so it's not
immediately released at scope exit. Also unlike the Java GC, the
D GC is less refined, and by design it is currently not precise.
With such large arrays often there are _inbound_ pointers that
keep the memory alive, especially on 32 bit systems. So perhaps
your problems are caused by the GC.

You can have deterministic memory management in your struct-based
matrices, but you have to allocate the memory manually (from the
GC or probably better from the C heap) and free it in the struct
destructor usign RAII.

> Another strange things was that the new opEquals implementation:
>
> 	bool opEquals(const ref Matrix other) const pure nothrow {
> 		if (this.dim != other.dim) {
> 			return false;
> 		}
> 		foreach (immutable i; 0 .. this.dim.size) {
> 			if (this.data[i] != other.data[i]) return false;
> 		}
> 		return true;
> 	}
>
> is actually about 20% faster than the one you have suggested. 
> With the single line of "return (this.dim == other.dim && 
> this.data[] == other.data[]).

I think this small performance bug is fixed in dmd 2.065 that is
currently in beta3.

> The last thing I haven't quite understood is that I tried to 
> replace
>
> auto t = Matrix(other).transposeAssign();
>
> in the matrix multiplication algorithm with its shorter and 
> clearer form
>
> auto t = other.transpose(); // sorry for the nasty '()', but I 
> like them! :/
>
> This however gave me wonderful segmentation faults on runtime 
> while using the matrix multiplication ...

This looks like a null-related bug.

I'll benchmark your code a little, but I think I don't have as
much RAM as you.

Bye,
bearophile