Euler problems 14, 135, 174

Thu Apr 8 19:05:26 PDT 2010

When I show D examples here I can't show large D programs because people will not read them, and I can't show 10-lines long D programs because they get ignored as useless toys. I have chosen three problems from the famous Project Euler problem set, number 14, 135 and 174:

http://projecteuler.net/index.php?section=problems&id=14
http://projecteuler.net/index.php?section=problems&id=135
http://projecteuler.net/index.php?section=problems&id=174

You can surely find a way to solve those three problems efficiently using idiomatic C language.

I have chosen to translate to D2 their Python implementation found in this blog post, "Three Pathological cases for the PyPy JIT":

http://www.ripton.net/blog/?p=51

Here I have on purpose translated Python code not written by me.

D is designed to help in the translation from C to D, but in the past I have translated Python code to D V.1 almost every day, it was a common thing. I think I am not the only person that wants to translate some Python code to D.

-------------------------------

Problem 14:

The following iterative sequence is defined for the set of positive integers:
n -> n / 2 (n is even)
n -> 3n + 1 (n is odd)
Which starting number, under one million, produces the longest chain?

That's a famous sequence. This is the original Python+Psyco implementation:

def next_num(num):
    if num & 1:
        return 3 * num + 1
    else:
        return num // 2

MAX_NUM = 1000000 # try with 5 millions too

lengths = {1: 0}

def series_length(num):
    global lengths
    if num in lengths:
        return lengths[num]
    else:
        num2 = next_num(num)
        result = 1 + series_length(num2)
        lengths[num] = result
        return result

def main():
    num_with_max_length = 1
    max_length = 0
    for ii in range(1, MAX_NUM):
        length = series_length(ii)
        if length > max_length:
            max_length = length
            num_with_max_length = ii
    print num_with_max_length, max_length

import psyco; psyco.full()

if __name__ == "__main__":
    main()

I am using Python 2.6.5 with Psyco 1.6.0final.

The translation to D2 is easy:

import std.stdio: writeln;

int next_num(int num) {
    if (num & 1)
        return 3 * num + 1;
    else
        return num / 2;
}

int[int] lengths;

static this() {
    lengths = [1: 0];
}

int series_length(int num) {
    int* p = num in lengths;
    if (p) {
        return *p;
    } else {
        int num2 = next_num(num);
        int result = 1 + series_length(num2);
        lengths[num] = result;
        return result;
    }
}

void main() {
    enum int MAX_NUM = 1_000_000; //
    //enum int MAX_NUM = 5_000_000; // s

    int num_with_max_length = 1;
    int max_length = 0;
    foreach (ii; 1 .. MAX_NUM) {
        int length = series_length(ii);
        if (length > max_length) {
            max_length = length;
            num_with_max_length = ii;
        }
    }
    writeln(num_with_max_length, " ", max_length);
}

The lengths array is not a static variable inside series_length() because it needs some initialization, that I have to perform in the static this().

If you try to run it (I am using dmd 2.042) you receive an error:
object.Error: Stack Overflow

It's a nice error because you know for sure there's a bug somewhere.
In Python the integer numbers are multi-precision, and often Euler problems use large integer numbers. So the first thing I've done is to change the ints into longs:

import std.stdio: writeln;

alias long Int;

Int next_num(Int num) {
    if (num & 1)
        return 3 * num + 1;
    else
        return num / 2;
}

Int[Int] lengths;

static this() {
    lengths = [1: 0];
}

Int series_length(Int num) {
    Int* p = num in lengths;
    if (p) {
        return *p;
    } else {
        Int num2 = next_num(num);
        Int result = 1 + series_length(num2);
        lengths[num] = result;
        return result;
    }
}

void main() {
    enum Int MAX_NUM = 1_000_000; //
    //enum Int MAX_NUM = 5_000_000; // s

    Int num_with_max_length = 1;
    Int max_length = 0;
    foreach (ii; 1 .. MAX_NUM) {
        Int length = series_length(ii);
        if (length > max_length) {
            max_length = length;
            num_with_max_length = ii;
        }
    }
    writeln(num_with_max_length, " ", max_length);
}

I can say this works because it gives the same answer of the Python program. If you don't have a Python program to compare the output to you can't be sure, because somewhere some number can be bigger than a long too, and cause silent bugs. If you don't have a Python program to compare it to you need unit tests and lot of care to be sure the result is correct.

With MAX_NUM=1_000_000 the max key in lengths is 56_991_483_520, so it's truly a sparse associative array. You can't use just a normal dynamic array to represent its key-values.

When the D2 program is debugged I compile with -O -release -inline.

The Python+Psyco program prints:
837799 524
It runs in 1.60 seconds on Windows 32 bit, 2.3 GHz CPU.

The D2 version prints the same output:
837799 524
It runs in 5.40 seconds.

The cause of the significantly lower performance of the D code is surely the Python dicts that are far more optimized, despite the interpter and JIT costs some time.

Putting __gshared before the definition of lengths doesn't change the running time of the D code.

I will be glad to benchmark the bigints of dmd 2.043. (I hope Don has implemented the small int optimization I suggested him. The less significant bit of a 32/64 bit union can be used as a boolean flag to tell apart a small int allocated in place on the stack, from a pointer to heap-allocated big int).

-------------------------------

Problem 135:

Given the positive integers, x, y, and z, are consecutive terms of an
arithmetic progression, the least value of the positive integer, n, for which
the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27:

34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27

It turns out that n = 1155 is the least value which has exactly ten solutions.

How many values of n less than one million have exactly ten distinct solutions?

The Python code is not complex:

def main():
    limit = 1000000
    counts = limit * [0]
    for d in range(1, limit / 4 + 1):
        d24 = d ** 2 * 4
        if d24 < limit:
            start = 1
            counts[d24] += 1
        else:
            start = int((d24 - limit) ** 0.5)
        if start < 1:
            start = 1
        for root in xrange(start, 2 * d):
            n = d24 - root ** 2
            if n <= 0:
                break
            if n < limit:
                z = root + d
                y = z + d
                x = y + d
                counts[n] += 1
                if d > root:
                    z = d - root
                    y = z + d
                    x = y + d
                    counts[n] += 1
    total = 0
    assert counts[0] == 0
    for val in counts:
        if val == 10:
            total += 1
    print total

import psyco; psyco.full()

if __name__ == "__main__":
    main()

The translation to D2 is easy, it uses the useful ^^ operator:

import std.stdio: writeln;
import std.math;

void main() {
    enum int limit = 1_000_000;
    auto counts = new int[limit];

    foreach (d; 1 .. limit / 4 + 1) {
        int d24 = d ^^ 2 * 4; // line 9
        int start;
        if (d24 < limit) {
            start = 1;
            counts[d24]++; // line 13
        } else
            start = cast(int)((d24 - limit) ^^ 0.5);

        if (start < 1)
            start = 1;

        foreach (root; start .. 2 * d) {
            int n = d24 - root ^^ 2;
            if (n <= 0)
                break;
            if (n < limit) {
                int z = root + d;
                int y = z + d;
                int x = y + d;
                counts[n]++;
                if (d > root) {
                    z = d - root;
                    y = z + d;
                    x = y + d;
                    counts[n]++;
                }
            }
        }
    }

    int total;
    assert(counts[0] == 0);
    foreach (val; counts)
        if (val == 10)
            total++;
    writeln(total);
}

I have imported std.math as a workaround to a bug (already in bugzilla), because ^^0.5 can't find the sqrt.

The D2 program produces a run time bug (thanks to array bound tests):
core.exception.RangeError at temp(13): Range violation

Some short debugging shows that line 9 has an integer overflow:
Values of d24:
...
2146654224
2146839556
2147024896
2147210244
2147395600
-2147386332 => error

When d24 becomes negative this if(d24<limit) is true, so this line runs:
counts[d24]++;
And it causes the out of array bounds error.

So I can replace the ints with longs as before, but then I have three compile errors because you can't index array items with a long. So I have to use some casts, this is the new code:

import std.stdio: writeln;
import std.math;

void main() {
    enum long limit = 1_000_000;
    auto counts = new long[limit];

    foreach (d; 1 .. limit / 4 + 1) {
        long d24 = d ^^ 2 * 4;
        long start;
        if (d24 < limit) {
            start = 1;
            counts[cast(uint)d24]++;
        } else
            start = cast(long)((d24 - limit) ^^ 0.5);

        if (start < 1)
            start = 1;

        foreach (root; start .. 2 * d) {
            long n = d24 - root ^^ 2;
            if (n <= 0)
                break;
            if (n < limit) {
                long z = root + d;
                long y = z + d;
                long x = y + d;
                counts[cast(uint)n]++;
                if (d > root) {
                    z = d - root;
                    y = z + d;
                    x = y + d;
                    counts[cast(uint)n]++;
                }
            }
        }
    }

    long total;
    assert(counts[0] == 0);
    foreach (val; counts)
        if (val == 10)
            total++;
    writeln(total);
}

This time the D program is much faster, it runs in 0.2 seconds, while the Psyco version needs 2.47 seconds.
Both print the 4989 result.

-------------------------------

Problem 174:

We shall define a square lamina to be a square outline with a square "hole" so
that the shape possesses vertical and horizontal symmetry.

Given eight tiles it is possible to form a lamina in only one way: 3x3 square
with a 1x1 hole in the middle. However, using thirty-two tiles it is possible
to form two distinct laminae.

http://projecteuler.net/project/images/p_173_square_laminas.gif

If t represents the number of tiles used, we shall say that t = 8 is type L(1)
and t = 32 is type L(2).

Let N(n) be the number of t <= 1000000 such that t is type L(n); for example,
N(15) = 832.

What is sum(N(n)) for 1 <= n <= 10?

The Python implementation:

from math import ceil
from collections import defaultdict

def gen_laminae(limit):
    """Yield laminae with up to limit squares, as tuples
    (outer, inner, squares)"""
    for outer in range(3, limit // 4 + 2):
        if outer & 1:
            min_min_inner = 1
        else:
            min_min_inner = 2
        min_inner_squared = outer ** 2 - limit
        if min_inner_squared < 0:
            min_inner = min_min_inner
        else:
            min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5)))
            if outer & 1 != min_inner & 1:
                min_inner += 1
        outer_squared = outer ** 2
        for inner in range(min_inner, outer - 1, 2):
            squares = outer_squared - inner ** 2
            yield (outer, inner, squares)

def main():
    squares_to_count = defaultdict(int)
    for (outer, inner, squares) in gen_laminae(1000000):
        squares_to_count[squares] += 1
    histogram = defaultdict(int)
    for val in squares_to_count.values():
        if val <= 10:
            histogram[val] += 1
    print sum(histogram.values())

if __name__ == "__main__":
    main()

The defaultdict is like a D associative array, you can for example increment a value even if the key is not already present (this is not allowed in normal python dict). If a value is not already present uses the given "constructor" to create a default value. Here it uses int, calling int() it returns 0.

Here I am not sure what to do, I can translate that code to D2 as it is, but I don't like that code too much, because it returns a tuple (outer,inner,squares) but then the main() uses just its squares item. This wastes some time to build all those tuples. So in theory I can translate to D both that Python version with the 3-tuple and a versione that yields a single squares value. To keep things simpler I translate just a Python version that yields a single number (I have not used Psyco because here it worsens the performance a bit):

from math import ceil
from collections import defaultdict

def gen_laminae(limit):
    for outer in range(3, limit // 4 + 2):
        if outer & 1:
            min_min_inner = 1
        else:
            min_min_inner = 2
        min_inner_squared = outer ** 2 - limit
        if min_inner_squared < 0:
            min_inner = min_min_inner
        else:
            min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5)))
            if (outer & 1) != (min_inner & 1):
                min_inner += 1
        outer_squared = outer ** 2
        for inner in range(min_inner, outer - 1, 2):
            squares = outer_squared - inner ** 2
            yield squares

def main():
    squares_to_count = defaultdict(int)
    for squares in gen_laminae(1000000):
        squares_to_count[squares] += 1
    histogram = defaultdict(int)
    for val in squares_to_count.values():
        if val <= 10:
            histogram[val] += 1
    print sum(histogram.values())

if __name__ == "__main__":
    main()

First translation to D:

import std.stdio: writeln;
import std.math: ceil;
import std.algorithm: max;
import std.math; // for ^^0.5

/// Yield laminae with up to limit squares, as squares
struct GenLaminae {
    int limit;

    int opApply(int delegate(ref int) dg) {
        int result;

        foreach (outer; 3 .. limit / 4 + 2) {
            int min_min_inner;
            if (outer & 1)
                min_min_inner = 1;
            else
                min_min_inner = 2;
            int min_inner_squared = outer ^^ 2 - limit;
            int min_inner;
            if (min_inner_squared < 0) {
                min_inner = min_min_inner;
            } else {
                min_inner = max(min_min_inner, cast(int)ceil(min_inner_squared ^^ 0.5));
                if ((outer & 1) != (min_inner & 1))
                    min_inner++;
            }
            int outer_squared = outer ^^ 2;

            // foreach can't be used here
            for (int inner = min_inner; inner < (outer - 1); inner += 2) {
                int squares = outer_squared - inner ^^ 2;
                result = dg(squares);
                if (result)
                    break;
            }
        }

        return result;
    }
}

void main() {
    int[int] squares_to_count;
    foreach (squares; GenLaminae(1_000_000))
        squares_to_count[squares]++;

    int[int] histogram;
    foreach (val; squares_to_count)
        if (val <= 10)
            histogram[val]++;

    int tot;
    foreach (val; histogram)
        tot += val;
    writeln(tot);
}

At the end of the main() I have used a foreach because Phobos2 seems to miss a sum(). It has reduce(), but reduce is less easy to use and it introduces a cognitive burden that's small but not small enough, so using the foreach is simpler. I suggest to add a sum() to Phobos2, if it's missing.

The D2 program doesn't seem to stop. Probably some other invisible overflow.

So crossing my fingers I have used longs instead (the opApply uses ints still, it just yields the long):

import std.stdio: writeln;
import std.math: ceil;
import std.algorithm: max;
import std.math; // for ^^0.5

/// Yield laminae with up to limit squares, as squares
struct GenLaminae {
    long limit;

    int opApply(int delegate(ref long) dg) {
        int result;

        foreach (outer; 3 .. limit / 4 + 2) {
            long min_min_inner;
            if (outer & 1)
                min_min_inner = 1;
            else
                min_min_inner = 2;
            long min_inner_squared = outer ^^ 2 - limit;
            long min_inner;
            if (min_inner_squared < 0) {
                min_inner = min_min_inner;
            } else {
                min_inner = max(min_min_inner, cast(long)ceil(min_inner_squared ^^ 0.5));
                if ((outer & 1) != (min_inner & 1))
                    min_inner++;
            }
            long outer_squared = outer ^^ 2;

            // foreach can't be used here
            for (long inner = min_inner; inner < (outer - 1); inner += 2) {
                long squares = outer_squared - inner ^^ 2;
                result = dg(squares);
                if (result)
                    break;
            }
        }

        return result;
    }
}

void main() {
    long[long] squares_to_count;
    foreach (squares; GenLaminae(1_000_000))
        squares_to_count[squares]++;

    long[long] histogram;
    foreach (val; squares_to_count)
        if (val <= 10)
            histogram[val]++;

    long tot;
    foreach (val; histogram)
        tot += val;
    writeln(tot);
}

By chance it seems to not overflow longs, and it gives the right output. The result is 209566.
The D programs runs in 0.77 seconds, the Psyco version in 3.66 seconds.

As a final test I have translated the D code to C#2, for mono. First I have translated the wrong version, that uses ints:

using System;
using System.Collections.Generic;

static class Euler174 {
    static IEnumerable<int> GenLaminae(int limit) {
        checked {
            for (int outer = 3; outer < limit / 4 + 2; outer++) {
                int min_min_inner;
                if ((outer & 1) != 0)
                    min_min_inner = 1;
                else
                    min_min_inner = 2;
                int min_inner_squared = outer * outer - limit;
                int min_inner;
                if (min_inner_squared < 0) {
                    min_inner = min_min_inner;
                } else {
                    min_inner = Math.Max(min_min_inner,
                                         (int)Math.Ceiling(Math.Sqrt(min_inner_squared)));
                    if ((outer & 1) != (min_inner & 1))
                        min_inner++;
                }
                int outer_squared = outer * outer;

                for (int inner = min_inner; inner < (outer - 1); inner += 2) {
                    int squares = outer_squared - inner * inner;
                    yield return squares;
                }
            }
        }
    }

    static void Main() {
        checked {    
            var squares_to_count = new Dictionary<int, int>();
            foreach (var squares in GenLaminae(1000000)) {
                if (squares_to_count.ContainsKey(squares))
                    squares_to_count[squares]++;
                else
                    squares_to_count.Add(squares, 1);
            }

            var histogram = new Dictionary<int, int>();
            foreach (var val in squares_to_count.Values) {
                if (val <= 10) {
                    if (histogram.ContainsKey(val)) // slow double lookup?
                        histogram[val]++;
                    else
                        histogram.Add(val, 1);
                }
            }

            int tot = 0;
            foreach (var val in histogram.Values)
                tot += val;
            Console.WriteLine("{0} ", tot);
        }
    }
}

At runtime it stops in about a second and gives a nice OverflowException, so I know the program is wrong:

Unhandled Exception: System.OverflowException: Number overflow.
  at Euler174+<GenLaminae>c__Iterator0.MoveNext () [0x00000] 
  at Euler174.Main () [0x00000] 

So I can try with longs, and this time it throws no exceptions, so longs are enough! (There can be other kind of bugs, of course, but one big source of bugs is killed):

using System;
using System.Collections.Generic;

static class Euler174 {
    static IEnumerable<long> GenLaminae(long limit) {
        checked {
            for (long outer = 3; outer < limit / 4 + 2; outer++) {
                long min_min_inner;
                if ((outer & 1) != 0)
                    min_min_inner = 1;
                else
                    min_min_inner = 2;
                long min_inner_squared = outer * outer - limit;
                long min_inner;
                if (min_inner_squared < 0) {
                    min_inner = min_min_inner;
                } else {
                    min_inner = Math.Max(min_min_inner,
                                         (long)Math.Ceiling(Math.Sqrt(min_inner_squared)));
                    if ((outer & 1) != (min_inner & 1))
                        min_inner++;
                }
                long outer_squared = outer * outer;

                for (long inner = min_inner; inner < (outer - 1); inner += 2) {
                    long squares = outer_squared - inner * inner;
                    yield return squares;
                }
            }
        }
    }

    static void Main() {
        checked {
            var squares_to_count = new Dictionary<long, long>();
            foreach (var squares in GenLaminae(1000000)) {
                if (squares_to_count.ContainsKey(squares))
                    squares_to_count[squares]++;
                else
                    squares_to_count.Add(squares, 1);
            }

            var histogram = new Dictionary<long, long>();
            foreach (var val in squares_to_count.Values) {
                if (val <= 10) {
                    if (histogram.ContainsKey(val)) // slow double lookup?
                        histogram[val]++;
                    else
                        histogram.Add(val, 1);
                }
            }

            long tot = 0;
            foreach (var val in histogram.Values)
                tot += val;
            Console.WriteLine("{0} ", tot);
        }
    }
}

That C# code runs in 0.52 seconds (Mono C# compiler, gmcs, probably version 2.6.3. Using -optimize+ changes nothing. I have not tried on dotnet that's faster). I have done few small tests and I've seen that most of the time is spent in the Main(), so here the lower performance of the D code (the D version needs 0.77 seconds to run) is again caused by slower associative arrays.

Note: If I comment out the checked{ the running time is about the same 0.52-0.53 seconds. Those checks don't slow down this program (because most of the time is spent in the associative array logic). In general the overhead of such tests is minimal.

Here C# shows to be safer and faster, even on mono.

This post shows that if you want to write (even simple) numerical code that uses integral numbers, you need to use everywhere efficient multi-precision integers, or at least you need integer overflows at runtime. Otherwise you are programming in the darkness.

Good luck with your programs,
bearophile