stdio performance in tango, stdlib, and perl

Wed Mar 21 17:21:40 PDT 2007

Derek Parnell wrote:
> On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
> Email) wrote:
> 
>>> Can you distill the benefits of retaining CR on a readline, please?
>> I am pasting fragments from an email to Walter. He suggested this at a 
>> point, and I managed to persuade him to keep the newline in there.
>>
>> Essentially it's about information. The naive loop:
>>
>> while (readln(line)) {
>>    write(line);
>> }
>>
>> is guaranteed 100% to produce an accurate copy of its input. The version 
>> that chops lines looks like:
>>
>> while (readln(line)) {
>>    writeln(line);
>> }
>>
>> This may or may not add a newline to the output, possibly creating a 
>> file larger by one byte. This is the kind of imprecision that makes the 
>> difference between a well-designed API and an almost-good one. Moreover, 
>> with the automated chopping it is basically impossible to write a 
>> program that exactly reproduces its input because readln essentially 
>> loses information.
> 
> 
> And exactly how often do people need to write this program? I would have
> thought that the need to exactly reproduce the input is kind of rare,
> because most programs read stuff to manipulate or deduce things from it,
> and not to replicate it.

Of course. It's not about reproducing the input exactly, but about 
having all of the information in the input available to the program.

>> Also, stdio also offers a readln() that creates a new line on every 
>> call. That is useful if you want fresh lines every read:
>>
>> char[] line;
>> while ((line = readln()).length > 0) {
>>    ++dictionary[line];
>> }
>>
>> The code _just works_ because an empty line means _precisely_ and 
>> without the shadow of a doubt that the file has ended. (An I/O error 
>> throws an exception, and does NOT return an empty line; that is another 
>> important point.) An API that uses automated chopping should not offer 
>> such a function because an empty line may mean that an empty line was 
>> read, or that it's eof time. So the API would force people to write 
>> convoluted code.
> 
> By "convoluted", you mean something like this ...
> 
>   char[] line;
>   while ( io.readln(line) == io.Success )
>   {
>      ++dictionary[line];
>   }

I said that the API would force people to write convoluted code if it 
wanted to offer char[] readln(). Consequently, your code is buggy in the 
likely case io.readln overwrites its buffer, which is mute testimony to 
the validity of my point :o).

>> In the couple of years I've used Perl I've thanked the Perl folks for 
>> their readline decision numerous times.
> 
> And yet my code nearly always looks like ...
> 
>    line = trim_right(readln());

I often do that too. And I'm glad I can remove information I don't need, 
because clearly I couldn't add back information I've lost.

It should be pointed out that my point generalizes to more than 
newlines. I plan to add to phobos two routines that efficiently and 
atomically implement the following:

read_delim(FILE*, char[] buf, dchar delim);

and

read_delim(FILE*, char[] buf, char delim[]);

For such functions, particularly the last one, it is vital that the 
delimiter is KEPT in the resulting buffer.

Andrei