WordCount performance

Wed Mar 26 14:17:36 PDT 2008

The following little program comes from a progressive stripping down of a program I was creating. This C and D code give the approximate count of the words in a file:

D version:

import std.c.stdio: printf, getchar, EOF;
import std.ctype: isspace;

void main() {
    int count, c;

  //OUTER:
    while (1) {
        while (1) {
            c = getchar();
            if (c == EOF)
                //break OUTER;
                goto END;
            if (!isspace(c))
                break;
        }

        count++;

        while (1) {
            c = getchar();
            if (c == EOF)
                //break OUTER;
                goto END;
            if (isspace(c))
                break;
        }
    }

  END:
    printf("%d\n", count);
}

C version:

#include <stdio.h>
#include <ctype.h>

int main() {
    int count = 0, c;

    while (1) {
        while (1) {
            c = getchar();
            if (c == EOF)
                goto END;
            if (!isspace(c))
                break;
        }

        count++;

        while (1) {
            c = getchar();
            if (c == EOF)
                goto END;
            if (isspace(c))
                break;
        }
    }

    END:
    printf("%d\n", count);
    return 0;
}

To test it, I have used a 7.5 MB file of real text. The C version (compiled with MinGW 4.2.1) is ~7.8 times faster (0.43 s instead of 3.35 s) than that very simpler code compiled with DMD (1.028). If I use a named break in the D code (that OUTER), to avoid the goto, the running speed is essentially the same.
On a 50 MB file of text the timings are 2.43 s and 20.74 s (C version 8.5+ times faster).
Disabling the GC doesn't change running speed of the D version.
A 7-8 times difference on such simple program is big enough to make me curious, do you know what the problem can be? (Maybe the getchar() as a function instead of macro?)

Bye,
bearophile