Google Code Jam 2011 Language Usage

Mon May 9 11:19:17 PDT 2011

On 5/9/11 12:43 PM, Timon Gehr wrote:
> Andrei Alexandrescu wrote:
>> I've implemented readf to be a fair amount more Nazi about whitespace than
>> scanf in an attempt to improve its precision. Scanf has been famously difficult
>> to use for complex input parsing and validation, and I attribute some of that
>> to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
>> readf's insistence on precise whitespace handling if there's enough evidence
>> that that serves most of our users. I personally believe that the current
>> behavior (strict by default, easy to relax) is best.
>
> In my experience readf behavior is not very useful for routine coding tasks that
> involve some IO.

If this assessment would be reverted by simply inserting spaces in the 
formatting string, I'd be hard pressed to agree.

I do agree that readf behavior is surprising if you expect 100% scanf 
compatibility. This is intentional and beneficial as I believe scanf is 
wanting in more than one way.

> If you really need to have very strict requirements about the input format, readf
> does not serve you well, because a ' ' still skips all whitespace, a failure to
> read leaves the file pointer in an undefined position etc.

That is not an issue (albeit some the underlying machinery is not yet 
implemented). If you want to skip at most one space but no other 
whitespace, insert "%*1[ ]" in the formatting string. To skip any number 
of spaces, insert "%*[ ]". Skipping exactly one space is not supported 
at the formatting string level, but you can always read one character 
with %c and then enforce the character is ' '. I agree that that could 
be improved. What's needed is a specification for the minimum number of 
characters read, e.g. "%*1.1[ ]" for scanning and skipping exactly one 
space.

In contrast, having e.g. %d skipping all whitespace is a losing 
proposition if you want to do precision parsing. This is because that 
behavior can't be disabled. That's why I excised it.

Reading is greedy. Failure to read leaves the pointer in a defined 
position, but we need to improve documentation.

> All carryovers from
> scanf. I never want to use scanf when there is a valid chance of invalid input.

I agree, but that's a problem with scanf that should and could be fixed. 
There's almost always a chance of invalid input.

> As
> far as I can see, neither readf nor scanf can be used for sophisticated input
> validation or parsing of non-trivial input. You have to do it manually. How does
> readf make things better with strict(er) whitespace handling?

Far as I can see, implementing Posix %[charset] extension would make 
readf a powerful one-stop shop for parsing input. Of course its speed 
needs to be up to snuff too. And of course its specification can be 
improved, which is where your input is very valuable.

> What behavior is by design, what behavior is caused by bugs? Can you give a
> real-world example where readf design clearly beats scanf design? (as it is the
> default it should be almost always better, but I fail to see it)
>
> Apart from that, what about the other points I mentioned?

I answered all of these in my other, longer post.

Andrei