Google Code Jam 2011 Language Usage

Mon May 9 10:29:36 PDT 2011

On 5/8/11 5:57 PM, Timon Gehr wrote:
> Andrei Alexandrescu wrote:
>> On 5/8/11 3:04 PM, Timon Gehr wrote:
>>> However I agree that Phobos has to provide some better input handling, since using
>>> possibly unsafe C functions is the best way to do it by now. (I think readf is
>>> severely crippled) I may try to implement a meaningful "read" function.
>>
>> Looking forward to detailed feedback about readf. It was implemented in
>> a hurry so definitely it has a long way to go.
>>
>> Andrei
>
> What I consider the most important points about readf:

Thanks very much for providing detailed feedback.

> 1. Whitespace handling is different than scanf. It is much stricter and even feels
> inconsistent, Eg:
>
> int a,b;
>
> readf("%s %s",&a,&b);//input "1 2\n" read.
> readf("%s %s",&a,&b);//input "1  2\n" read (and a==1&&  b==2).

So far so good. By design one space in readf means "skip all whitespace".

> readf("%s",&a);//input "1\n" read. yay.
> readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.

I'm not seeing skipping in my tests; I do see an exception being thrown. 
Here's how I test:

import std.stdio;
void main()
{
     int a, b;
     readf("%s",&a);
     assert(a == 1);
     readf("%s",&b);
     assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The first input is read into 'a' and reading stops just at the \n. Next 
you're trying to read "\n 2" into b, which fails due to the strict 
whitespace handling. To fix this, you'd need to insert a space before 
the second "%s".

I'm not hooked on this strict whitespace handling, but I think it makes 
a lot of sense particularly when you want to make sure the input looks 
exactly as you think it should. With scanf you can't have precise 
parsing even if you wanted; with readf all you need is to insert a space.

Precision is important. For example, Hive uses a \t for field separation 
when streaming to a file. It is very important to figure that you have 
one tab there versus two (two means a NULL field was in between).

> readf("%s ",&a);//input "1 \n" read.
> readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!)
> is missing.

On my machine this passes:

import std.stdio;
void main()
{
     int a, b;
     readf("%s ",&a);
     assert(a == 1);
     readf("%s ",&b);
     assert(b == 2);
}

dmd ./test && echo '1\n 2' | ./test

The explanation is that, again, a space means "skip all whitespace". So 
the first space eats the "\n " and the second space eats the final "\n" 
in the input (produced by echo). Please adjust this example so it unduly 
fails.

> readf(" %s",&a);//input "1\n" read.
> readf("\t%s",&a);//input "1\n": exception is thrown.

A "\t" in the formatting string for readf simply requires a tab. To skip 
over any number of tabs, do this:

readf("%*1[\t]%s",&a);

That instructs readf to read, but not store, a string consisting of at 
most one tab. (To skip multiple tabs drop the "1".) This functionality 
is not yet implemented.

> readf("%s\n",&a);//input "1\n" read.
> readf("%s\n",&a);//input "1 \n": exception is thrown.

That is as expected - if you specify \n readf expects a \n.

> readf("%s\t\n",&a);//input "1\t\n" read.
> readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
> further input.

My testbed:

import std.stdio;

void main()
{
     int a, b;
     readf("%s\t\n",&a);
     assert(a == 1);
     readf("%s \n",&b);
     assert(b == 2);
}

dmd ./test && echo "1\t\n2 " | ./test

It fails because it can't find the last \n. That's a bug.

> And some more, I do not remember all of them. Exceptions are most of the time only
> as useful as "Enforcement failed".
>
>
> You (almost?) never want this behavior, even at the points it marginally makes
> sense. It would be nice to have an optional whitespace-enforcing version that
> _really_ enforces it
> (as opposed to the current implementation), but that should not be the default.
> And then it should be consistent (also on skipping or exception throwing).

Except for one bug and one lacking implementation artifact, I find the 
current behavior consistent with a strict approach to whitespace handling.

> 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its '>>'
> more.
>     scanf has that problem too, but it is a C function, you _cannot_ expect it to
> do any better than that.
>     D has variadic template functions that may take ref parameters. It can be done
> entirely pointer-free.

When I implemented readf, ref variadic arguments weren't working. I'd be 
hesitant to change it right now as it does not improve actual 
functionality and disrupts current uses. But I agree ideally it should 
accept parameters by reference.

> 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why did
> you throw away the idea of static overloads? It would have been a powerful feature,
>     and very useful for this case. scanf in C/C++ does not have this problem,
> because most modern compilers generate warnings for this. But that is making some
> functions
>     "more equal than the others"

One early version I had was doing that and spelled

readf!"format string"(arguments);

Unfortunately, sometimes runtime-computed formatting strings are needed 
and useful (see the recent std.log discussion...) so I decided to go 
with dynamic formatting for now. Once we get that right, providing an 
optional compile-time-checked formatting function shouldn't be too 
difficult with CTFE.

> 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
> mistakenly claimed before). I think this is just a quality of implementation
> issue, but it is important.

I agree. I'm amazed readf is not slower actually. It uses by character 
file iteration, by far the slowest (and most embarrassing) code I wrote 
in Phobos: each character read entails one call to getc() to fetch the 
character, one call to ungetc() to restore the stream position, and 
finally one more call to getc() to move forward. The code is correct but 
very slow. Some C APIs provide undocumented means to peek at the next 
character in the stream without actually advancing the stream, which is 
what we need. I know how to do it on most Unixen and Walter knows how to 
do it on his own cstdlib implementation. We didn't have the time yet, 
and I'm glad the matter is under spotlight.

>     Especially for programming competitions where there are time limits, you do not
> want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)

Agreed.

>     Other than that, D is WAY the most convenient language I have ever tried to
> solve small algorithmic tasks in.
> 5. Not really readf related: There's writef(ln) and there is write(ln). And then
> there is readf. I will provide a proof-of-concept for the read function soon.

Good idea. I suggest you provide a template read(T)() that mimics the 
functionality of Java's nextInt, nextFloat etc:

auto a = stdin.next!int();
auto b = stdin.next!double();
auto s = stdin.next!string("\n"); // read a string up to \n
...

Andrei