Optimize my code =)

Fri Feb 14 08:00:07 PST 2014

Hiho,

I am fairly new to the D programming and still reading throught 
the awesome online book at http://ddili.org/ders/d.en/index.html.
However, I think it is missing some things or I am missing 
glasses and overlooked these parts in the book.^^

Currently I am trying to write a very simple Matrix library for 
some minor linear algebra operations such as calculating 
determine, some simple compositions etc. I am aware that this 
probably has been done already and I am mainly focusing learning 
D with this project. =)

I already wrote a similar Matrix library for Java and C++ with 
more or less the same algorithms in order to benchmark these 
languages.

As I am very new to D I instantly ran into certain optimizing 
issues. E.g. the simple matrix multiplication based in my java 
implementation requires about 1.5 secs for multiplying two 
1000x1000 matrices, however my D implementation requires 39 
seconds without compiler optimizations and about 14 seconds with 
-inline, -O, -noboundscheck and -release. So I am sure that I 
just missed some possible optimization routines for D and that I 
could miss entire patters used in D to write more optimized code 
- especially when looking at the const and immutability concepts.

I won't paste the whole code, just the important parts.

class Matrix(T = double) {
	private T[] data;
	private Dimension dim;
}

This is my class with its templated data as a one dimensional 
array (as I don't like jagged-arrays) and a dimension (which is a 
struct). The dimension object has some util functionality such as 
getting the total size or mapping (row, col) to a one-dimensional 
index besides the getter for rows and columns.

this(size_t rows, size_t cols) {
	this.dim = Dimension(rows, cols);
	this.data = new T[this.dim.size];
	enum nil = to!T(0);
	foreach(ref T element; this.data) element = nil;
}

This is one of the constructor. Its task is just to create a new 
Matrix instance filled with what is 0 for the templated value - 
don't know if there is a better approach for this. I experienced 
that floating point values are sadly initialized with nan which 
isn't what I wanted -> double.init = nan.

I am using opIndex and opIndexAssign in order to access and 
assign the matrix values:

T opIndex(size_t row, size_t col) const {
	immutable size_t i = this.dim.offset(row, col);
	if (i >= this.dim.size) {
		// TODO - have to learn exception handling in D first. :P
	}
	return this.data[i];
}

Which calls:

size_t size() @property const {
	return this.rows * this.cols;
}

I think and hope that this is getting optimized via inlining. =)
This works similar for opIndexAssign.

The function I am mainly benchmarking is the simple matrix 
multiplication where one of the multiplied matrices is tranposed 
first in order to improve cache hit ratio.

Matrix opMul(ref const(Matrix) other) {
	if (this.dim.rows != other.dim.cols || this.dim.cols != 
ther.dim.rows) {
		// TODO - still have to learn exception handling first ...
	}
	auto m = new Matrix(this.dim.rows, other.dim.cols);
	auto s = new Matrix(other).transposeAssign();
	size_t r1, r2, c;
	T sum;
	for (r1 = 0; r1 < this.dim.rows; ++r1) {
		for (r2 = 0; r2 < other.dim.rows; ++r2) {
			sum = to!T(0);
			for (c = 0; c < this.dim.cols; ++c) {
				sum += this[r1, c] * other[r2, c];
			}
			m[r1, r2] = sum;
		}
	}
	return m;
}

I am aware that there are faster algorithms for matrix 
multiplication but I am mainly learning how to write efficient D 
code in general and not based on any algorithm.

This is more or less the same algorithm I am using with java and 
c++. Both require about 1.5 secs (after java warm-up) to perform 
this task for two 1000x1000 matrices. D compiled with DMD takes 
about 14 seconds with all (known-to-me) optimize flag activated. 
(listed above)

I wanted to make s an immutable matrix in order to hopefully 
improve performance via this change, however I wasn't technically 
able to do this to be honest.

Besides that I can't find a way how to make a use of move 
semantics like in C++. For example a rvalue-move-constructor or a 
move-assign would be a very nice thing for many tasks.

I am not quite sure also if it is normal in D to make an idup 
function as slices have which creates an immutable copy of your 
object. And are there important things I have to do while 
implementing an idup method for this matrix class?

Another nice thing to know would be if it is possible to 
initialize an array before it is default initialized with T.init 
where T is the type of the array's fields. In C++ e.g. there is 
no default initialization which is nice if you have to initialize 
every single field anyway. E.g. in a Matrix.random() method which 
creates a matrix with random values. There it is unnecessary that 
the (sometimes huge) array is completely initialized with the 
type's init value.

Thanks in advance!

Robin