Casting MapResult

Mon Jun 22 18:27:17 PDT 2015

On Tuesday, 16 June 2015 at 16:37:35 UTC, John Colvin wrote:
> If you want really fast exponentiation of an array though, you 
> want to use SIMD. Something like http://www.yeppp.info would be 
> easy to use from D.

I've been looking into SIMD a little. It turns out that core.simd 
only works for DMD on Linux machines. Not sure about the other 
compilers, but I was a bit stuck for a little on it. I read a 
little on SIMD as I had no real understanding of it before you 
mentioned it. At least I understand why all the types on 
core.simd were so small. My initial reaction was there's no way I 
would want to write a code just for float[4], but now I'm like 
"oh that's the whole point".

Anyway, I might try to put something together on my other machine 
one of these days, but I was able to make a little bit more 
progress with D's std.parallelism. The foreach loops work great, 
even on Windows, with little extra work required.

That being said, I'm not seeing any speed-up from parallel map. I 
put some code below doing some variations on std.algorithm.map 
and taskPool.map. The more the memory allocation (through .array) 
the longer everything takes. Keeping things as ranges seems to be 
much faster.

The most interesting result to me was that the taskPool.map was 
slower than std.algorithm.map in each case. Maybe a difference 
between being semi-eager versus lazy. The code below doesn't show 
it, but it seems like the parallel foreach loop is faster than 
std.algorithm.map or taskPool.map when doing everything with 
arrays.

import std.datetime;
import std.parallelism;
import std.conv : to;
import std.math : exp;
import std.stdio : writeln;
import std.array : array;
import std.range : iota;

enum real x_size = 100_000;

void f0()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size));
}

void f1()
{
	auto y = taskPool.map!exp(iota(x_size));
}

void f2()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size)).array;
}

void f3()
{
	auto y = taskPool.map!exp(iota(x_size)).array;
}

void f4()
{
	auto y = std.algorithm.map!(a => exp(a))(iota(x_size).array);
}

void f5()
{
	auto y = taskPool.map!exp(iota(x_size).array);
}

void f6()
{
	auto y = std.algorithm.map!(a => 
exp(a))(iota(x_size).array).array;
}

void f7()
{
	auto y = taskPool.map!exp(iota(x_size).array).array;
}

void main() {
	auto r = benchmark!(f0, f1, f2, f3, f4, f5, f6, f7)(100);
	auto f0Result = to!Duration(r[0]);
	auto f1Result = to!Duration(r[1]);
	auto f2Result = to!Duration(r[2]);
	auto f3Result = to!Duration(r[3]);
	auto f4Result = to!Duration(r[4]);
	auto f5Result = to!Duration(r[5]);
	auto f6Result = to!Duration(r[6]);
	auto f7Result = to!Duration(r[7]);
	writeln(f0Result);			//prints ~ 17us on my machine
	writeln(f1Result);			//prints ~ 4.3ms on my machine
	writeln(f2Result);			//prints ~ 1.7s on my machine
	writeln(f3Result);			//prints ~ 3.5s on my machine
	writeln(f4Result);			//prints ~ 471ms on my machine
	writeln(f5Result);			//prints ~ 473ms on my machine
	writeln(f6Result);			//prints ~ 1.9s on my machine
	writeln(f7Result);			//prints ~ 3.9s on my machine
}