stdio performance in tango, stdlib, and perl

Derek Parnell derek at nomail.afraid.org
Wed Mar 21 17:11:25 PDT 2007


On Wed, 21 Mar 2007 16:40:15 -0700, Andrei Alexandrescu (See Website For
Email) wrote:

> 
>> Can you distill the benefits of retaining CR on a readline, please?
> 
> I am pasting fragments from an email to Walter. He suggested this at a 
> point, and I managed to persuade him to keep the newline in there.
> 
> Essentially it's about information. The naive loop:
> 
> while (readln(line)) {
>    write(line);
> }
> 
> is guaranteed 100% to produce an accurate copy of its input. The version 
> that chops lines looks like:
> 
> while (readln(line)) {
>    writeln(line);
> }
> 
> This may or may not add a newline to the output, possibly creating a 
> file larger by one byte. This is the kind of imprecision that makes the 
> difference between a well-designed API and an almost-good one. Moreover, 
> with the automated chopping it is basically impossible to write a 
> program that exactly reproduces its input because readln essentially 
> loses information.


And exactly how often do people need to write this program? I would have
thought that the need to exactly reproduce the input is kind of rare,
because most programs read stuff to manipulate or deduce things from it,
and not to replicate it.
 
> Also, stdio also offers a readln() that creates a new line on every 
> call. That is useful if you want fresh lines every read:
> 
> char[] line;
> while ((line = readln()).length > 0) {
>    ++dictionary[line];
> }
> 
> The code _just works_ because an empty line means _precisely_ and 
> without the shadow of a doubt that the file has ended. (An I/O error 
> throws an exception, and does NOT return an empty line; that is another 
> important point.) An API that uses automated chopping should not offer 
> such a function because an empty line may mean that an empty line was 
> read, or that it's eof time. So the API would force people to write 
> convoluted code.

By "convoluted", you mean something like this ...

  char[] line;
  while ( io.readln(line) == io.Success )
  {
     ++dictionary[line];
  }
   


> In the couple of years I've used Perl I've thanked the Perl folks for 
> their readline decision numerous times.

And yet my code nearly always looks like ...

   line = trim_right(readln());

because I then have to parse the data contained in the line and white space
(blank, tab and new line) at the end of a line is just usually cruft. On
the other hand, as I have to trim the line anyhow, I guess it doesn't
matter if the routine ensures a new line or not. 

Another interesting twist is that some text files omit the new-line on the
last line in the file.

> Ever tried to do cin or fscanf? You can't do any intelligent input with 
> them because they skip whitespace and newlines like it's out of style. 
> All of my C++ applications use getline() or fgets() (both of which 
> thankfully do include the newline) and then process the line in-situ.

I conclude that we tend to write different types of apps.

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Justice for David Hicks!"
22/03/2007 10:55:43 AM



More information about the Digitalmars-d mailing list