Google Code Jam 2011 Language Usage
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Mon May 9 10:29:36 PDT 2011
On 5/8/11 5:57 PM, Timon Gehr wrote:
> Andrei Alexandrescu wrote:
>> On 5/8/11 3:04 PM, Timon Gehr wrote:
>>> However I agree that Phobos has to provide some better input handling, since using
>>> possibly unsafe C functions is the best way to do it by now. (I think readf is
>>> severely crippled) I may try to implement a meaningful "read" function.
>>
>> Looking forward to detailed feedback about readf. It was implemented in
>> a hurry so definitely it has a long way to go.
>>
>> Andrei
>
> What I consider the most important points about readf:
Thanks very much for providing detailed feedback.
> 1. Whitespace handling is different than scanf. It is much stricter and even feels
> inconsistent, Eg:
>
> int a,b;
>
> readf("%s %s",&a,&b);//input "1 2\n" read.
> readf("%s %s",&a,&b);//input "1 2\n" read (and a==1&& b==2).
So far so good. By design one space in readf means "skip all whitespace".
> readf("%s",&a);//input "1\n" read. yay.
> readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.
I'm not seeing skipping in my tests; I do see an exception being thrown.
Here's how I test:
import std.stdio;
void main()
{
int a, b;
readf("%s",&a);
assert(a == 1);
readf("%s",&b);
assert(b == 2);
}
dmd ./test && echo '1\n 2' | ./test
The first input is read into 'a' and reading stops just at the \n. Next
you're trying to read "\n 2" into b, which fails due to the strict
whitespace handling. To fix this, you'd need to insert a space before
the second "%s".
I'm not hooked on this strict whitespace handling, but I think it makes
a lot of sense particularly when you want to make sure the input looks
exactly as you think it should. With scanf you can't have precise
parsing even if you wanted; with readf all you need is to insert a space.
Precision is important. For example, Hive uses a \t for field separation
when streaming to a file. It is very important to figure that you have
one tab there versus two (two means a NULL field was in between).
> readf("%s ",&a);//input "1 \n" read.
> readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!)
> is missing.
On my machine this passes:
import std.stdio;
void main()
{
int a, b;
readf("%s ",&a);
assert(a == 1);
readf("%s ",&b);
assert(b == 2);
}
dmd ./test && echo '1\n 2' | ./test
The explanation is that, again, a space means "skip all whitespace". So
the first space eats the "\n " and the second space eats the final "\n"
in the input (produced by echo). Please adjust this example so it unduly
fails.
> readf(" %s",&a);//input "1\n" read.
> readf("\t%s",&a);//input "1\n": exception is thrown.
A "\t" in the formatting string for readf simply requires a tab. To skip
over any number of tabs, do this:
readf("%*1[\t]%s",&a);
That instructs readf to read, but not store, a string consisting of at
most one tab. (To skip multiple tabs drop the "1".) This functionality
is not yet implemented.
> readf("%s\n",&a);//input "1\n" read.
> readf("%s\n",&a);//input "1 \n": exception is thrown.
That is as expected - if you specify \n readf expects a \n.
> readf("%s\t\n",&a);//input "1\t\n" read.
> readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
> further input.
My testbed:
import std.stdio;
void main()
{
int a, b;
readf("%s\t\n",&a);
assert(a == 1);
readf("%s \n",&b);
assert(b == 2);
}
dmd ./test && echo "1\t\n2 " | ./test
It fails because it can't find the last \n. That's a bug.
> And some more, I do not remember all of them. Exceptions are most of the time only
> as useful as "Enforcement failed".
>
>
> You (almost?) never want this behavior, even at the points it marginally makes
> sense. It would be nice to have an optional whitespace-enforcing version that
> _really_ enforces it
> (as opposed to the current implementation), but that should not be the default.
> And then it should be consistent (also on skipping or exception throwing).
Except for one bug and one lacking implementation artifact, I find the
current behavior consistent with a strict approach to whitespace handling.
> 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its '>>'
> more.
> scanf has that problem too, but it is a C function, you _cannot_ expect it to
> do any better than that.
> D has variadic template functions that may take ref parameters. It can be done
> entirely pointer-free.
When I implemented readf, ref variadic arguments weren't working. I'd be
hesitant to change it right now as it does not improve actual
functionality and disrupts current uses. But I agree ideally it should
accept parameters by reference.
> 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why did
> you throw away the idea of static overloads? It would have been a powerful feature,
> and very useful for this case. scanf in C/C++ does not have this problem,
> because most modern compilers generate warnings for this. But that is making some
> functions
> "more equal than the others"
One early version I had was doing that and spelled
readf!"format string"(arguments);
Unfortunately, sometimes runtime-computed formatting strings are needed
and useful (see the recent std.log discussion...) so I decided to go
with dynamic formatting for now. Once we get that right, providing an
optional compile-time-checked formatting function shouldn't be too
difficult with CTFE.
> 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
> mistakenly claimed before). I think this is just a quality of implementation
> issue, but it is important.
I agree. I'm amazed readf is not slower actually. It uses by character
file iteration, by far the slowest (and most embarrassing) code I wrote
in Phobos: each character read entails one call to getc() to fetch the
character, one call to ungetc() to restore the stream position, and
finally one more call to getc() to move forward. The code is correct but
very slow. Some C APIs provide undocumented means to peek at the next
character in the stream without actually advancing the stream, which is
what we need. I know how to do it on most Unixen and Walter knows how to
do it on his own cstdlib implementation. We didn't have the time yet,
and I'm glad the matter is under spotlight.
> Especially for programming competitions where there are time limits, you do not
> want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
Agreed.
> Other than that, D is WAY the most convenient language I have ever tried to
> solve small algorithmic tasks in.
> 5. Not really readf related: There's writef(ln) and there is write(ln). And then
> there is readf. I will provide a proof-of-concept for the read function soon.
Good idea. I suggest you provide a template read(T)() that mimics the
functionality of Java's nextInt, nextFloat etc:
auto a = stdin.next!int();
auto b = stdin.next!double();
auto s = stdin.next!string("\n"); // read a string up to \n
...
Andrei
More information about the Digitalmars-d
mailing list