How to use D parallel functions/library

Hey everyone.  A new D learner here.  So far I love D and how 
much better its working than C++.  One thing I like doing is 
parallel functions so with C++ using OMP.  Right now Im trying to 
figure out how to do Conways Game of Life in D in parallel.  
Serially D is much faster than C++ so I feel fairly confident 
that it should be faster using D's parallelism library.

In C++ with OMP its pretty easy to do a parallel for with a 
private and a reduction variable but I am having problems 
understanding how to do this in D.  Heres the meat of my parallel 
code for the Game of Life.  Can yall help me understand how to 
convert this to D?

//Iterate through 2d matrix ignoring the border cells (starting 
at 1 and going to matrix size)
#pragma omp for private (x) reduction (+:alive) schedule (dynamic)
		for (int i = 1; i <= sizeX; i++)
			for (int j = 1; j <= sizeY; j++)
				//Set X to 0... sumerize all 8 of X's neighbors including 
border cells
				x = 0;
				x += matrixA[i - 1][j] + matrixA[i + 1][j] + matrixA[i][j - 
1] + matrixA[i][j + 1] + matrixA[i - 1][j - 1] + matrixA[i - 1][j 
+ 1] + matrixA[i + 1][j - 1] + matrixA[i + 1][j + 1];

				//If cell is alive
				if (matrixA[i][j] == true)
					//Cell dies if it doesnot have 2 or 3 neighbors
					if (x < 2 || x > 3)
						matrixB[i][j] = false;
					//Mark cell as alive in matrix B
						matrixB[i][j] = true;

				//If cell is not alive
					//Cell becomes alive if it has exactly 3 neighbors
					if (x == 3)
						//Mark cell alive in matrix B
						matrixB[i][j] = true;

The Matrices are bools since its only alive or dead.  I keep 
track of the number of alive cells so that I can see at a glance 
if things are working correctly since the same seed run the same 
number of iterations will always have the same outcome.  For 
simplicity sake imagine that the matrices are 2002 x 2002.  The 
reason they are extra rows and columns is so that I can do wrap 
around but thats not relevant here.

I figured this would be a simple parallel foreach function with 
an iota range of sizeX and just making int X declared inside the 
function so that I didnt have to worry about shared variable but 
I cant get around the alive++ reduction and I dont understand 
enough about D's reduction/parallel library.

Any ideas?  Thanks in advance for yalls patience and assistance!


