How to best translate this C++ algorithm into D?

Sat Jun 7 08:39:52 PDT 2014

On Saturday, 7 June 2014 at 13:31:21 UTC, logicchains wrote:
> On Saturday, 7 June 2014 at 13:25:07 UTC, Timon Gehr wrote:
>> What about (untested)?:
>>
>> static forest_t[] possible_meals = [{-1, -1, +1}, {-1, +1, 
>> -1}, {+1, -1, -1}];
>> return forests.map!(a=>possible_meals.map!(b=>b+a))
>>    .join.partition!forest_invalid.sort.uniq.array;
>>
>> (Of course, this still allocates two arrays.)
>
> That seems around 5% slower, for some reason.

Testing with dmd head, with input "300 300 300" (for what it's 
worth).

I was able to identify two "choke points". The first, is building 
the initial forest. A straight up allocation + loop gave me about 
a 25% improvement (10s => 8S)

     forest_t[] arr = 
uninitializedArray!(forest_t[])(forests.length * 3);
     size_t j = size_t.max;
     foreach ( ref const a; forests )
     {
         arr[++j] = forest_t(-1, -1, +1) + a;
         arr[++j] = forest_t(-1, +1, -1) + a;
         arr[++j] = forest_t(+1, -1, -1) + a;
     }

Then, you are using the code:
.partition!(forest_invalid).sort.uniq.array;

The "advantage" of "partition" over "filter" is that "partition" 
is a greedy inplace algorithm. But then, you go on to use "uniq", 
which is lazy, and requires "array". I implemented a (very) 
simplistic in-place "removeDuplicates":

T[] removeDuplicates(T)(T[] r)
{
     if (r.empty) return r;
     size_t i = 0, j = 1;
     for ( ; j < r.length ; ++j)
     {
        if (r[i] != r[j])
          r[++i] = r[j];
     }
     return r[0 .. ++i];
}

=> .partition!(forest_invalid).sort.removeDuplicates;

This gave me a some speed up (but nothing huge). Tweaking 
removeDuplicates to detect "runs at once" could prove to squeeze 
out more juice but ...

... instead of keeping "true" to your code, there is a faster way 
than "sort" to remove duplicates. Hashing. The code becomes:

     bool[forest_t] hash;
     foreach (tree ; arr.partition!forest_invalid)
         hash[tree] = true;
     return hash.keys;

This give me about a 90% speedup (8s => 4.5s). Note though that 
this is not in-place at all. Also, I don't know if having the 
ouput actually sorted was important for you, or if it was only an 
implementation detail. In any case, if you want sorted, then you 
can:
return hash.keys.sort();
But at that point, there has to be *some* duplication to justify 
using the hash table.

The downside to all of this, is that the solution is now 
not-so-functional:

   forest_t[] meal(forest_t[] forests) {
     forest_t[] arr = 
uninitializedArray!(forest_t[])(forests.length * 3);
     size_t j = size_t.max;
     foreach ( ref const a; forests )
     {
         arr[++j] = forest_t(-1, -1, +1) + a;
         arr[++j] = forest_t(-1, +1, -1) + a;
         arr[++j] = forest_t(+1, -1, -1) + a;
     }
     bool[forest_t] hash;
     foreach (tree ; arr.partition!forest_invalid)
         hash[tree] = true;
     return hash.keys;
   }