stdio performance in tango, stdlib, and perl

Fri Mar 23 02:08:24 PDT 2007

Andrei Alexandrescu (See Website For Email) wrote:
> Roberto Mariottini wrote:
[...]
>>> Essentially it's about information. The naive loop:
>>>
>>> while (readln(line)) {
>>>   write(line);
>>> }
>>
>> I'm completely against that awful mess of code.
> 
> What exactly would be bad about it?

It's not clearly evident for a non-expert programmer that a new-line is 
appended at each line.
Take any programmer from any language of your choice and ask what this 
snippets is supposed to do.
This is against immediate comprehension of code.

>>> is guaranteed 100% to produce an accurate copy of its input. The 
>>> version that chops lines looks like:
>>>
>>> while (readln(line)) {
>>>   writeln(line);
>>> }
>>>
>>> This may or may not add a newline to the output, possibly creating a 
>>> file larger by one byte.
>>
>> Are you sure? Can you elaborate more on this?
> 
> Very simple. If the file ends with a newline, the code reproduces it. If 
> not, the code gratuitously appends a newline.

A newline is two bytes here.

>>> Moreover, with the automated chopping it is basically impossible to 
>>> write a program that exactly reproduces its input because readln 
>>> essentially loses information.

A text file is not a binary file.
A newline at end of file is completely irrelevant.

On the other side, no code should break if the last newline is there or 
not. The problem with your code is that the last line comes different 
from the others.

>>> Also, stdio also offers a readln() that creates a new line on every 
>>> call. That is useful if you want fresh lines every read:
>>>
>>> char[] line;
>>> while ((line = readln()).length > 0) {
>>>   ++dictionary[line];
>>> }
>>
>> This way you'll get two different dictionaries on Windows and on Unix.
>> Wrong, very wrong.
> 
> Yes, wrong, very wrong. Except it's not me who's wrong :o).

Ehm, can you elaborate how good is to put a '\n' at the end of any 
string when working with:

  - databases
  - communication programs
  - interprocess communication
  - distributed computing

>>> The code _just works_ because an empty line means _precisely_ and 
>>> without the shadow of a doubt that the file has ended. (An I/O error 
>>> throws an exception, and does NOT return an empty line; that is 
>>> another important point.) An API that uses automated chopping should 
>>> not offer such a function because an empty line may mean that an 
>>> empty line was read, or that it's eof time. So the API would force 
>>> people to write convoluted code.
>>
>> What is your definition of "convolute"?
>> I find your code 'convolute', 'unclear', 'buggy' and 'unportable'.
> 
> You are objectively wrong. 

Say 'subjectively'.
Assignments in boolean expressions should be avoided. The average 
programmer knows something about this magic, but fears to touch it, and 
never completely understand it.

Still, any programmer from any language would think that this code ends 
at the first empty line.

Here is one of the many possible non-convoluted versions:

char[] line = readln();
while (line.length > 0) {
   ++dictionary[chomp(line)];
   line = readln();
}

And this is how it should be:

char[] line = readln();
while (line != null) {
   ++dictionary[line];
   line = readln();
}

> The code is portable. Newline translation 
> takes care of it. Just try it.

Newline translation is an old problem with C, C++ and now with D.
Nothing can be resolved with newline translation.

Opening a file in binary mode on Unix and treating it like a text file 
works only as long as the program is run on Unix.
Newline translation is prone to portability errors, thus non-portable.

In my experience, newline translations pose more portability problems 
than it solves.

>>> In the couple of years I've used Perl I've thanked the Perl folks for 
>>> their readline decision numerous times.
>>
>> Per is something the world should get rid of, quickly.
>> Per is wrong, Perl is evil, Perl is useless.
>> You don't need Perl, try to cease using it.
>>
>> The fact that this narrow-minded idea comes from Perl is not surprising.
> 
> What can I say? Thanks! I'm enlightened!

You'll be more enlightened if you had to work with big CGI scripts 
written in Perl, and eventually had to convert them to JSP to make the 
average (available) programmers able to work on them.

Sure, with Perl you can do many things in less than 10 lines.
But keep it less than 10 lines, or you are in troubles.

>>> Ever tried to do cin or fscanf? You can't do any intelligent input 
>>> with them because they skip whitespace and newlines like it's out of 
>>> style. 
>>
>> I use them, and I find them very comfortable.
>> Again your definition of 'intelligent' is particular.
>> If you find Perl 'intelligent', this say a lot.
> 
> To each their own :o). Oh, probably you could explain how I can read a 
> string containing spaces, followed by ":" and a number with scanf. Takes 
> one line in Perl and D's readfln (not yet distributed).

scanf(" :%d", &i);

Ciao