Google Code Jam 2011 Language Usage
Timon Gehr
timon.gehr at gmx.ch
Mon May 9 14:00:22 PDT 2011
Sry, overlooked this post.
Andrei Alexandrescu wrote:
> On 5/8/11 5:57 PM, Timon Gehr wrote:
>> Andrei Alexandrescu wrote:
>>> On 5/8/11 3:04 PM, Timon Gehr wrote:
>>>> However I agree that Phobos has to provide some better input handling, since
using
>>>> possibly unsafe C functions is the best way to do it by now. (I think readf is
>>>> severely crippled) I may try to implement a meaningful "read" function.
>>>
>>> Looking forward to detailed feedback about readf. It was implemented in
>>> a hurry so definitely it has a long way to go.
>>>
>>> Andrei
>>
>> What I consider the most important points about readf:
>
>Thanks very much for providing detailed feedback.
>
>> 1. Whitespace handling is different than scanf. It is much stricter and even feels
>> inconsistent, Eg:
>>
>> int a,b;
>>
>> readf("%s %s",&a,&b);//input "1 2\n" read.
>> readf("%s %s",&a,&b);//input "1 2\n" read (and a==1&& b==2).
>
> So far so good. By design one space in readf means "skip all whitespace".
>
>> readf("%s",&a);//input "1\n" read. yay.
>> readf("%s",&a);//input " 1\n" skipped. All subsequent input is skipped too.
>
> I'm not seeing skipping in my tests; I do see an exception being thrown.
> Here's how I test:
>
> import std.stdio;
> void main()
> {
> int a, b;
> readf("%s",&a);
> assert(a == 1);
> readf("%s",&b);
> assert(b == 2);
> }
>
> dmd ./test && echo '1\n 2' | ./test
I tested inputting manually in terminal. The exception is thrown only when I
provide an EOF. Seems like the input is not being skipped after all, but readf
does not return until there is an EOF.
> I'm not hooked on this strict whitespace handling, but I think it makes
> a lot of sense particularly when you want to make sure the input looks
> exactly as you think it should. With scanf you can't have precise
> parsing even if you wanted; with readf all you need is to insert a space.
>
> Precision is important. For example, Hive uses a \t for field separation
> when streaming to a file. It is very important to figure that you have
> one tab there versus two (two means a NULL field was in between).
It should be possible to do that with scanf using %[] if I'm not mistaken.
> readf("%s ",&a);//input "1 \n" read.
> readf("%s ",&a);//input "1\n" skipped, presumably because the trailing space (!)
> is missing.
>
> On my machine this passes:
>
> import std.stdio;
> void main()
> {
> int a, b;
> readf("%s ",&a);
> assert(a == 1);
> readf("%s ",&b);
> assert(b == 2);
> }
>
> dmd ./test && echo '1\n 2' | ./test
>
> The explanation is that, again, a space means "skip all whitespace". So
> the first space eats the "\n " and the second space eats the final "\n"
> in the input (produced by echo). Please adjust this example so it unduly
> fails.
Again, misinterpretation on my side. Typing into the terminal expects new input
until a non-whitespace character is inserted. Should be fine, but can be surprising.
>
>> readf(" %s",&a);//input "1\n" read.
>> readf("\t%s",&a);//input "1\n": exception is thrown.
>
> A "\t" in the formatting string for readf simply requires a tab. To skip
> over any number of tabs, do this:
>
> readf("%*1[\t]%s",&a);
>
> That instructs readf to read, but not store, a string consisting of at
> most one tab. (To skip multiple tabs drop the "1".) This functionality
> is not yet implemented.
I did not know it would ever be! That removes many of my concerns. (and the 'read'
function removes the rest)
>> readf("%s\n",&a);//input "1\n" read.
>> readf("%s\n",&a);//input "1 \n": exception is thrown.
>
> That is as expected - if you specify \n readf expects a \n.
>
>> readf("%s\t\n",&a);//input "1\t\n" read.
>> readf("%s \n",&a);//input "1 \n" skipped. readf throws an exception after any
>> further input.
>
> My testbed:
>
> import std.stdio;
>
> void main()
> {
> int a, b;
> readf("%s\t\n",&a);
> assert(a == 1);
> readf("%s \n",&b);
> assert(b == 2);
> }
>
> dmd ./test && echo "1\t\n2 " | ./test
>
> It fails because it can't find the last \n. That's a bug.
At least I found one. =)
>> And some more, I do not remember all of them. Exceptions are most of the time only
>> as useful as "Enforcement failed".
>>
>>
>> You (almost?) never want this behavior, even at the points it marginally makes
>> sense. It would be nice to have an optional whitespace-enforcing version that
>> _really_ enforces it
>> (as opposed to the current implementation), but that should not be the default.
>> And then it should be consistent (also on skipping or exception throwing).
> Except for one bug and one lacking implementation artifact, I find the
> current behavior consistent with a strict approach to whitespace handling.
Agreed. Thanks for your explanations!
>> 2. readf takes pointers. Ugly, end of story. I even like C++ cin with all its '>>'
>> more.
>> scanf has that problem too, but it is a C function, you _cannot_ expect it to
>> do any better than that.
>> D has variadic template functions that may take ref parameters. It can be done
>> entirely pointer-free.
>
> When I implemented readf, ref variadic arguments weren't working. I'd be
> hesitant to change it right now as it does not improve actual
> functionality and disrupts current uses. But I agree ideally it should
> accept parameters by reference.
We can have both, since it will never be possible to read in raw pointers:
import std.stdio;
import std.conv;
private bool containsPointersImpl(T...)(){ //nesting this inside containsPointer
template removes eponymous template trick. Is this a bug?
foreach(t;T) static if(is(t U:U*)) return true;
return false;
}
template containsPointers(T...){enum containsPointers=containsPointersImpl!T();}
private bool onlyPointersImpl(T...)(){
foreach(t;T) static if(!is(t U:U*)) return false;
return true;
}
template onlyPointers(T...){enum onlyPointers=onlyPointersImpl!T();}
private string _readfImpl(int len){
string res="return std.stdio.stdin.readf(format,";
foreach(t;0..len) res~="&args["~to!string(t)~"], ";
res~=");";
return res;
}
int _readf(T...)(string format, ref T args)
if(!containsPointers!T){mixin(_readfImpl(T.length));}
//classic definition for backwards compatibility.
int _readf(T...)(string format, T args) if(onlyPointers!T){
return std.stdio.stdin.readf(format, args);
}
void main(){
int a;
_readf(" %s",&a);
writeln(a);
_readf(" %s",a);
writeln(a);
}
>> 3. nonsense like readf("mooh",&a); cannot be caught at compile time. When/Why did
>> you throw away the idea of static overloads? It would have been a powerful feature,
>> and very useful for this case. scanf in C/C++ does not have this problem,
>> because most modern compilers generate warnings for this. But that is making some
>> functions
>> "more equal than the others"
>
> One early version I had was doing that and spelled
>
> readf!"format string"(arguments);
>
> Unfortunately, sometimes runtime-computed formatting strings are needed
> and useful (see the recent std.log discussion...) so I decided to go
> with dynamic formatting for now. Once we get that right, providing an
> optional compile-time-checked formatting function shouldn't be too
> difficult with CTFE.
The problem I see here is that the dynamic version still cannot be checked when
passed a statically known format string.
Why did you drop the idea of allowing something like
int readf(T...)(static string format, T args) ?
>
>> 4. readf is slow. It is about 3-4 times slower than scanf (not 2-3, as I
>> mistakenly claimed before). I think this is just a quality of implementation
>> issue, but it is important.
>
> I agree. I'm amazed readf is not slower actually. It uses by character
> file iteration, by far the slowest (and most embarrassing) code I wrote
> in Phobos: each character read entails one call to getc() to fetch the
> character, one call to ungetc() to restore the stream position, and
> finally one more call to getc() to move forward. The code is correct but
> very slow. Some C APIs provide undocumented means to peek at the next
> character in the stream without actually advancing the stream, which is
> what we need. I know how to do it on most Unixen and Walter knows how to
> do it on his own cstdlib implementation. We didn't have the time yet,
> and I'm glad the matter is under spotlight.
>
>> Especially for programming competitions where there are time limits, you do not
>> want IO to unnecessarily become a mayor bottleneck. (Input files can be huge)
>
> Agreed.
>
>> Other than that, D is WAY the most convenient language I have ever tried to
>> solve small algorithmic tasks in.
>> 5. Not really readf related: There's writef(ln) and there is write(ln). And then
>> there is readf. I will provide a proof-of-concept for the read function soon.
>
> Good idea. I suggest you provide a template read(T)() that mimics the
> functionality of Java's nextInt, nextFloat etc:
>
> auto a = stdin.next!int();
> auto b = stdin.next!double();
> auto s = stdin.next!string("\n"); // read a string up to \n
> ...
>
>
> Andrei
Yes, I think it should support:
auto a = read!int;
auto b = read!double;
auto s = read!string("\n"); // this could be an overload on immutability.
alternative would be read!(string,"\n"); I don not know.
auto x = read!(int[])(50); // read an array of 50 integers separated by whitespace
auto y = read!(int[],",")(50); // read an array of 50 integers separated by commas
auto z = read!(int[],", ")(50); // read an array of 50 integers separated by
commas and whitespace
Plus the same for every type that can be to!type(string)'d.
But also: read should replace readf wherever possible in the following forms:
int a; double b; string s;
read(a,b,s);//reads whitespace-separated a, b and s in turn. (delimiter could be
changed by template argument or so)
char[] c=new char[1000];
read(c); // only relocates c if the number of read characters exceeds 1000.
One problem I see: An evildoer could provide a huge input, filling up the whole
RAM. I think this vulnerability is also present in readln. Any ideas?
Non-string arrays are handled this way:
int[100] arr;
read(arr); // reads 100 integers and stores in arr
read(arr[0..20]); //reads 20 integers into the first 20 slots of arr
int arr[] = new arr[100];
read(arr); //ditto
Rationale: reading input should not /require/ heap activity.
The read function would cover all cases where no strict whitespace handling is
required, and readf would take the rest! I think that would be a very nice solution.
Timon
More information about the Digitalmars-d
mailing list