stdio line-streaming revisited

Wed Mar 28 18:52:30 PDT 2007

Last week there were a series of posts regarding some optimized code 
within phobos streams. A question posed was, without those same 
optimizations, would tango.io be slower than the improved phobos [1]

As these new phobos IO functions are now available, Andrei's "benchmark" 
[2] was run on both Win32 and linux to see where tango.io could use some 
improvement.

The results indicate:

1) on linux, the fastest variation of the revised phobos code runs 40% 
slower than the generic tango.io equivalent. On the other hand, the new 
phobos code seems a bit faster than perl

2) on win32, similar testing shows tango.io to be more than six times 
faster than the improved phobos code. Tweaking the tango.io library a 
little makes it over eight times faster than the phobos equivalent [3]

3) On Win32, generic tango.io is more than twice as efficient as the 
fastest C version identified. It's also notably faster than MinGW 'cat', 
which apparently performs various under-the-cover optimizations.

4) by making some further optimizations in the phobos client-code using 
setvbuf() and fputs(), the improved phobos version can be sped up 
significantly; at that point tango.io is only three times faster than 
phobos on Win32. These adjustments require knowledge of tweaking the 
underlying C library; thus, they may belong to the group of C++ tweaks 
which Walter quibbled with last week. The setvbuf() tweaks make no 
noticable difference on linux, though the fputs() improvements are 
accounted for in #1 (above)

Note that tango.io is not explicitly optimized for this behaviour. While 
some quick hacks to the library have been shown to make it around 20% 
faster than the generic package (for this specfic test), the efficiency 
benefits are apparently derived through the approach more than anything 
else. With some changes to a core tango.io module, similar performance 
multipliers could presumeably be exhibited on linux platforms also. That 
is: tango.io is relatively sedate on linux, compared to its win32 variation.

FWIW: if some of those "Language Shootout" tests are IO-bound, perhaps 
tango.io might help? Can't imagine they'd apply that as a "language" 
test, but stranger things have happened before.

Here's the tango.io client (same as last week):

-------------
import tango.io.Console;

void main()
{
   char[] content;

   while (Cin.nextLine (content, true))
          Cout (content);
}
------------

and here's the fastest phobos equivalent. Removing the setvbuf() code 
makes it consume around twice as much time on Win32. Note that this 
version is faster than the equivalent code posted last week, though 
obviously more specialized and verbose:

------------
import std.stdio;
import std.cstream;

void main() {
     char[] buf = new char[1000 ];
     size_t len;
     const size_t BUFSIZE = 2 * 1024;

     setvbuf(stdin, null, _IOFBF, BUFSIZE);
     setvbuf(stdout, null, _IOFBF, BUFSIZE);

     while (( len = readln(buf)) != 0) {
         assert(len < 1000);
         buf[len] = '\0';
         fputs(buf.ptr, stdout);
     }
}
------------

[1] Timing measurements can be supplied to those interested.

[2] The recent changes within phobos apparently stemmed from Andrei 
piping large text files through his code, and this "benchmark" is a 
reflection of that process.

[3] That ~20% optimization has been removed from the generic package at 
this time, since we feel it doesn't contribute very much to the overall 
IO picture. It can be restored if people find that necessary, and there 
is no change to client code.