Google Code Jam 2011 Language Usage
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Mon May 9 11:19:17 PDT 2011
On 5/9/11 12:43 PM, Timon Gehr wrote:
> Andrei Alexandrescu wrote:
>> I've implemented readf to be a fair amount more Nazi about whitespace than
>> scanf in an attempt to improve its precision. Scanf has been famously difficult
>> to use for complex input parsing and validation, and I attribute some of that
>> to its laissez-faire attitude toward whitespace. I'd be glad to relax some of
>> readf's insistence on precise whitespace handling if there's enough evidence
>> that that serves most of our users. I personally believe that the current
>> behavior (strict by default, easy to relax) is best.
>
> In my experience readf behavior is not very useful for routine coding tasks that
> involve some IO.
If this assessment would be reverted by simply inserting spaces in the
formatting string, I'd be hard pressed to agree.
I do agree that readf behavior is surprising if you expect 100% scanf
compatibility. This is intentional and beneficial as I believe scanf is
wanting in more than one way.
> If you really need to have very strict requirements about the input format, readf
> does not serve you well, because a ' ' still skips all whitespace, a failure to
> read leaves the file pointer in an undefined position etc.
That is not an issue (albeit some the underlying machinery is not yet
implemented). If you want to skip at most one space but no other
whitespace, insert "%*1[ ]" in the formatting string. To skip any number
of spaces, insert "%*[ ]". Skipping exactly one space is not supported
at the formatting string level, but you can always read one character
with %c and then enforce the character is ' '. I agree that that could
be improved. What's needed is a specification for the minimum number of
characters read, e.g. "%*1.1[ ]" for scanning and skipping exactly one
space.
In contrast, having e.g. %d skipping all whitespace is a losing
proposition if you want to do precision parsing. This is because that
behavior can't be disabled. That's why I excised it.
Reading is greedy. Failure to read leaves the pointer in a defined
position, but we need to improve documentation.
> All carryovers from
> scanf. I never want to use scanf when there is a valid chance of invalid input.
I agree, but that's a problem with scanf that should and could be fixed.
There's almost always a chance of invalid input.
> As
> far as I can see, neither readf nor scanf can be used for sophisticated input
> validation or parsing of non-trivial input. You have to do it manually. How does
> readf make things better with strict(er) whitespace handling?
Far as I can see, implementing Posix %[charset] extension would make
readf a powerful one-stop shop for parsing input. Of course its speed
needs to be up to snuff too. And of course its specification can be
improved, which is where your input is very valuable.
> What behavior is by design, what behavior is caused by bugs? Can you give a
> real-world example where readf design clearly beats scanf design? (as it is the
> default it should be almost always better, but I fail to see it)
>
> Apart from that, what about the other points I mentioned?
I answered all of these in my other, longer post.
Andrei
More information about the Digitalmars-d
mailing list