A significant performance difference

Sun Aug 31 03:55:31 PDT 2014

This is C++ code that solves one Euler problem:

--------------
#include <stdio.h>
#include <map>

const unsigned int H = 9, W = 12;

const int g[6][3] = {{7, 0, H - 3},
                      {1 + (1 << H) + (1 << (2 * H)), 0, H - 1},
                      {3 + (1 << H), 0, H - 2},
                      {3 + (2 << H), 0, H - 2},
                      {1 + (1 << H) + (2 << H), 0, H - 2},
                      {1 + (1 << H) + (1 << (H - 1)), 1, H - 1}};

int main() {
     unsigned long long p, i, k;
     unsigned int j, l;
     std::map<unsigned int, unsigned long long> x, y;
     x[0] = 1;

     for (i = 0; i < W; ++i) {
         y.clear();
         while (!x.empty()) {
             j = x.begin()->first;
             p = x.begin()->second;
             x.erase(x.begin());

             for (k = 0; k < H; ++k)
                 if ((j & (1 << k)) == 0)
                     break;

             if (k == H)
                 y[j >> H] += p;
             else
                 for (l = 0; l < 6; ++l)
                     if (k >= g[l][1] && k <= g[l][2])
                         if ((j & (g[l][0] << k)) == 0)
                             x[j + (g[l][0] << k)] += p;
         }
         x = y;
     }

     printf("%lld\n", y[0]);
     return 0;
}
--------------

I have translated it to D like this (I know in D there are nicer 
ways to write it, but I have tried to keep the look of the code 
as much similar as possible to the C++ code):

--------------
import core.stdc.stdio;

const uint H = 9, W = 12;

const uint[3][6] g = [[7, 0, H - 3],
                       [1 + (1 << H) + (1 << (2 * H)), 0, H - 1],
                       [3 + (1 << H), 0, H - 2],
                       [3 + (2 << H), 0, H - 2],
                       [1 + (1 << H) + (2 << H), 0, H - 2],
                       [1 + (1 << H) + (1 << (H - 1)), 1, H - 1]];

int main() {
     ulong p, i, k;
     uint j, l;
     ulong[uint] x, y;
     x[0] = 1;

     for (i = 0; i < W; ++i) {
         y = null;
         while (x.length) {
             j = x.byKey.front;
             p = x.byValue.front;
             x.remove(cast(int)j);

             for (k = 0; k < H; ++k)
                 if ((j & (1 << k)) == 0)
                     break;

             if (k == H)
                 y[j >> H] += p;
             else
                 for (l = 0; l < 6; ++l)
                     if (k >= g[l][1] && k <= g[l][2])
                         if ((j & (g[l][0] << k)) == 0)
                             x[j + (g[l][0] << k)] += p;
         }
         x = y;
     }

     printf("%lld\n", y[0]);
     return 0;
}
--------------

The C++ code is much faster than the D code (I see the D code 30+ 
times slower with dmd and about like 20 times with ldc2). One 
difference between the C++ and D code is that the C++ map uses a 
search tree (red-black probably), while the D code uses a hash.

To test that algorithmic difference, if I replace the map in the 
C++ code with a std::unordered_map (C++11):

#include <unordered_map>
...
std::unordered_map<unsigned int, unsigned long long> x, y;

then the run-time increases (more than two times) but it's still 
much faster than the D code.

Is it possible to fix the D code to increase its performance 
(there are associative array libraries for D, but I have not 
tried them in this program).

Bye,
bearophile