stdio performance in tango, stdlib, and perl
Andrei Alexandrescu (See Website For Email)
SeeWebsiteForEmail at erdani.org
Wed Mar 21 16:40:15 PDT 2007
kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>
>>>> 13.9s Tango
>>>> 6.6s Perl
>>>> 5.0s std.stdio
>>>
>>>
>>>
>>> There's a couple of things to look at here:
>>>
>>> 1) if there's an idiom in tango.io, it would be rewriting the example
>>> like this: Cout.conduit.copy (Cin.conduit)
>>
>> The test code assumed taking a look at each line before printing it,
>> so speed of line reading and writing was deemed as important, not
>> speed of raw I/O, which we all know how to get.
>
> Yep, just trying to isolate things
>
>>> 3) the test would appear to be stressing the parsing of lines just as
>>> much (if not more) than the io system itself. All part-and-parcel to
>>> a degree, but it may be worth investigating
>>
>>
>> I don't understand this.
>
> Just suggesting that the scanning for [\r]\n patterns is likely a good
> chunk of the CPU time
>
>>> b) foregoing the output .newline, purely as an experiment
>>
>>
>> 4.7s tcat
>
> Thanks. If tango.io were to retain CR on readln, then it would come out
> ahead of everything else in this particular test
Well probably but must be tested. Newlines comprise about 3% of the file
size.
> Can you distill the benefits of retaining CR on a readline, please?
I am pasting fragments from an email to Walter. He suggested this at a
point, and I managed to persuade him to keep the newline in there.
Essentially it's about information. The naive loop:
while (readln(line)) {
write(line);
}
is guaranteed 100% to produce an accurate copy of its input. The version
that chops lines looks like:
while (readln(line)) {
writeln(line);
}
This may or may not add a newline to the output, possibly creating a
file larger by one byte. This is the kind of imprecision that makes the
difference between a well-designed API and an almost-good one. Moreover,
with the automated chopping it is basically impossible to write a
program that exactly reproduces its input because readln essentially
loses information.
Also, stdio also offers a readln() that creates a new line on every
call. That is useful if you want fresh lines every read:
char[] line;
while ((line = readln()).length > 0) {
++dictionary[line];
}
The code _just works_ because an empty line means _precisely_ and
without the shadow of a doubt that the file has ended. (An I/O error
throws an exception, and does NOT return an empty line; that is another
important point.) An API that uses automated chopping should not offer
such a function because an empty line may mean that an empty line was
read, or that it's eof time. So the API would force people to write
convoluted code.
In the couple of years I've used Perl I've thanked the Perl folks for
their readline decision numerous times.
Ever tried to do cin or fscanf? You can't do any intelligent input with
them because they skip whitespace and newlines like it's out of style.
All of my C++ applications use getline() or fgets() (both of which
thankfully do include the newline) and then process the line in-situ.
Andrei
More information about the Digitalmars-d
mailing list