WordCount performance
bearophile
bearophileHUGS at lycos.com
Wed Mar 26 14:17:36 PDT 2008
The following little program comes from a progressive stripping down of a program I was creating. This C and D code give the approximate count of the words in a file:
D version:
import std.c.stdio: printf, getchar, EOF;
import std.ctype: isspace;
void main() {
int count, c;
//OUTER:
while (1) {
while (1) {
c = getchar();
if (c == EOF)
//break OUTER;
goto END;
if (!isspace(c))
break;
}
count++;
while (1) {
c = getchar();
if (c == EOF)
//break OUTER;
goto END;
if (isspace(c))
break;
}
}
END:
printf("%d\n", count);
}
C version:
#include <stdio.h>
#include <ctype.h>
int main() {
int count = 0, c;
while (1) {
while (1) {
c = getchar();
if (c == EOF)
goto END;
if (!isspace(c))
break;
}
count++;
while (1) {
c = getchar();
if (c == EOF)
goto END;
if (isspace(c))
break;
}
}
END:
printf("%d\n", count);
return 0;
}
To test it, I have used a 7.5 MB file of real text. The C version (compiled with MinGW 4.2.1) is ~7.8 times faster (0.43 s instead of 3.35 s) than that very simpler code compiled with DMD (1.028). If I use a named break in the D code (that OUTER), to avoid the goto, the running speed is essentially the same.
On a 50 MB file of text the timings are 2.43 s and 20.74 s (C version 8.5+ times faster).
Disabling the GC doesn't change running speed of the D version.
A 7-8 times difference on such simple program is big enough to make me curious, do you know what the problem can be? (Maybe the getchar() as a function instead of macro?)
Bye,
bearophile
More information about the Digitalmars-d
mailing list