How to best translate this C++ algorithm into D?

logicchains via Digitalmars-d digitalmars-d at puremagic.com
Sun Jun 8 01:18:15 PDT 2014


On Saturday, 7 June 2014 at 15:39:54 UTC, monarch_dodra wrote:
> ... instead of keeping "true" to your code, there is a faster 
> way than "sort" to remove duplicates. Hashing. The code becomes:
>
>     bool[forest_t] hash;
>     foreach (tree ; arr.partition!forest_invalid)
>         hash[tree] = true;
>     return hash.keys;
>
> This give me about a 90% speedup (8s => 4.5s). Note though that 
> this is not in-place at all. Also, I don't know if having the 
> ouput actually sorted was important for you, or if it was only 
> an implementation detail. In any case, if you want sorted, then 
> you can:

That's excellent, thank you. I didn't realise using a hash table 
could be so quick. The only reason I sorted it originally was 
because I was assuming that 'uniq' was like std::unique in that 
it only removes consecutive duplicates, and so requires the list 
to be sorted to remove all.

It's still slightly less than half the speed of the C++ version, 
but I'm out of ideas now. I thought the C++ version might be 
vectorised but I checked the objdump and it wasn't, and I made an 
attempt at vectorising the D version, via inline SSE, but that 
didn't turn out to be significantly faster.

I considered putting 'meal' straight into 'find_stable_forests', 
like:

auto find_stable_forests(in forest_t forest){
   forest_t[] forests = [forest];
   return forests
     .recurrence!((a,n) => a[n-1].map!(a => [forest_t(-1, -1, 
+1)+a, forest_t(-1, +1, -1)+a, forest_t(+1, -1, -1)+a])
		     .join
		     .partition!(forest_invalid)
		     .uniq)
     .until!(a => !devouring_possible(a));

But I couldn't get the types to match up.

If anyone's interested, my attempt at vectorising it was as 
follows. Interestingly, I found that taking the address of 
potentialMeals didn't work if it was made immutable or static, 
which is a pity as it seems unnecessary to allocate it every 
iteration.

   auto forestsAddr = forests.ptr;
   size_t forLen = forests.length;
   scope forest_t[] newFors = 
uninitializedArray!(forest_t[])(forLen*3);
   auto newForsAddr = newFors.ptr;
   size_t bytesToSimd = (forLen)*3*4;
   int[12] potentialMeals = [-1, -1, 1, 0, -1, 1, -1, 0, 1, -1, 
-1, 0];
   asm{
       movupd XMM0, [potentialMeals];
       movupd XMM1, [potentialMeals+16];
       movupd XMM2, [potentialMeals+32];
       mov R8, forestsAddr;
       mov R9, forestsAddr;
       add R9, bytesToSimd;
       mov RCX, newForsAddr;
   loop:;
       movupd XMM3, [R8];
       movupd XMM4, [R8];
       movupd XMM5, [R8];
       paddd XMM3, XMM0;
       paddd XMM4, XMM1;
       paddd XMM5, XMM2;
       movupd [RCX], XMM3;
       movupd [RCX+12], XMM4;
       movupd [RCX+24], XMM5;
       add RCX, 36;
       add R8, 12;
       cmp R8, R9;
       jne loop;
   }




More information about the Digitalmars-d mailing list