stdio performance in tango, stdlib, and perl

kris foo at bar.com
Wed Mar 21 15:35:11 PDT 2007


Andrei Alexandrescu (See Website For Email) wrote:
> I've ran a couple of simple tests comparing Perl, D's stdlib (the coming 
> release), and Tango.
> 
> First, I realize I should make an account on dsource.org and post the 
> following there, but I'll mention here that it's quite disappointing 
> that Tango's idiomatic method of reading a line from the console 
> (Cin.nextLine(line) unless I missed something) chose to chop the newline 
> automatically. The Perl book spends half a page or so explaining why 
> it's _good_ that the newline is included in the line, and I've been 
> thankful for that on numerous occasions when writing Perl. Please put 
> the newline back in the line.
> 
> Anyhow, here's the code. The D up-and-coming stdio version:
> 
> import std.stdio;
> void main() {
>   char[] line;
>   while (readln(line)) {
>     write(line);
>   }
> }
> 
> The Tango version:
> 
> import tango.io.Console;
> void main() {
>   char[] line;
>   while (Cin.nextLine(line)) {
>     Cout(line).newline;
>   }
> }
> 
> (The .newline adds back the information that nextLine promptly lost, 
> sigh.) I'm not sure whether this is the idiomatic way of reading and 
> writing lines in Tango, but tango.io.Stdout seems to say so: "If you 
> don't need formatted output or unicode translation, consider using the 
> module tango.io.Console directly." - which suggests that Console would 
> be the most primitive stdio library.
> 
> The Perl version:
> #!/usr/bin/env perl
> while (<>) {
>   print;
> }
> 
> All programs operate in the same exact boring way: read a line from 
> stdin, print it, lather, rinse, repeat.
> 
> I passed a 31 MB text file (containing a dictionary that I'm using in my 
> research) through each of the programs above. The output was set to 
> /dev/null. I've ran the same program multiple times before the actual 
> test, so everything is cached and the process becomes 
> computationally-bound. Here are the results summed for 10 consecutive 
> runs (averaged over 5 epochs):
> 
> 13.9s        Tango
> 6.6s        Perl
> 5.0s        std.stdio


There's a couple of things to look at here:

1) if there's an idiom in tango.io, it would be rewriting the example 
like this:  Cout.conduit.copy (Cin.conduit)

2) the output.newline on each line will cause a flush ~ this may or may 
not have something to do with it

3) the test would appear to be stressing the parsing of lines just as 
much (if not more) than the io system itself. All part-and-parcel to a 
degree, but it may be worth investigating

In order to track this down, we'd be interested to see the results of:

a) Cout.conduit.copy (Cin.conduit);

b) foregoing the output .newline, purely as an experiment

c) on Linux, tango.io uses the c-lib posix.read/write functions. Is that 
what phobos uses also? (on Win32, Tango uses direct Win32 calls instead)

Just a head's up: Console is not the lowest IO level. It wraps both a 
streaming-buffer and console idioms around the raw IO. Raw IO in tango 
is based around two virtual methods: read(void[]) and write(void[])



More information about the Digitalmars-d mailing list