stdio performance in tango, stdlib, and perl
Andrei Alexandrescu (See Website For Email)
SeeWebsiteForEmail at erdani.org
Wed Mar 21 17:21:40 PDT 2007
Derek Parnell wrote:
> On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
> Email) wrote:
>
>>> Can you distill the benefits of retaining CR on a readline, please?
>> I am pasting fragments from an email to Walter. He suggested this at a
>> point, and I managed to persuade him to keep the newline in there.
>>
>> Essentially it's about information. The naive loop:
>>
>> while (readln(line)) {
>> write(line);
>> }
>>
>> is guaranteed 100% to produce an accurate copy of its input. The version
>> that chops lines looks like:
>>
>> while (readln(line)) {
>> writeln(line);
>> }
>>
>> This may or may not add a newline to the output, possibly creating a
>> file larger by one byte. This is the kind of imprecision that makes the
>> difference between a well-designed API and an almost-good one. Moreover,
>> with the automated chopping it is basically impossible to write a
>> program that exactly reproduces its input because readln essentially
>> loses information.
>
>
> And exactly how often do people need to write this program? I would have
> thought that the need to exactly reproduce the input is kind of rare,
> because most programs read stuff to manipulate or deduce things from it,
> and not to replicate it.
Of course. It's not about reproducing the input exactly, but about
having all of the information in the input available to the program.
>> Also, stdio also offers a readln() that creates a new line on every
>> call. That is useful if you want fresh lines every read:
>>
>> char[] line;
>> while ((line = readln()).length > 0) {
>> ++dictionary[line];
>> }
>>
>> The code _just works_ because an empty line means _precisely_ and
>> without the shadow of a doubt that the file has ended. (An I/O error
>> throws an exception, and does NOT return an empty line; that is another
>> important point.) An API that uses automated chopping should not offer
>> such a function because an empty line may mean that an empty line was
>> read, or that it's eof time. So the API would force people to write
>> convoluted code.
>
> By "convoluted", you mean something like this ...
>
> char[] line;
> while ( io.readln(line) == io.Success )
> {
> ++dictionary[line];
> }
I said that the API would force people to write convoluted code if it
wanted to offer char[] readln(). Consequently, your code is buggy in the
likely case io.readln overwrites its buffer, which is mute testimony to
the validity of my point :o).
>> In the couple of years I've used Perl I've thanked the Perl folks for
>> their readline decision numerous times.
>
> And yet my code nearly always looks like ...
>
> line = trim_right(readln());
I often do that too. And I'm glad I can remove information I don't need,
because clearly I couldn't add back information I've lost.
It should be pointed out that my point generalizes to more than
newlines. I plan to add to phobos two routines that efficiently and
atomically implement the following:
read_delim(FILE*, char[] buf, dchar delim);
and
read_delim(FILE*, char[] buf, char delim[]);
For such functions, particularly the last one, it is vital that the
delimiter is KEPT in the resulting buffer.
Andrei
More information about the Digitalmars-d
mailing list