Phobos file input (Was: Re: List of issues PVS-Studio statically analyzes for )

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Jul 24 13:02:00 PDT 2011


On 7/24/11 11:17 AM, Timon Gehr wrote:
> Walter Bright wrote:
>> On 7/23/2011 4:10 AM, bearophile wrote:
>>> D doesn't currently catch errors coming from [...]
>>> bad usage of core.stdc functions (like strlen, printf, etc).
>>
>>
>> This isn't likely to happen. D's mission isn't to try and fix usage of C functions.
>
> Note that currently, unsafe cstdio functions are often faster than Phobos stdio
> functions by a factor large enough to force people to use the C functions for IO
> bound tasks.

Only a fraction of them.

> I don't know how many people on this newsgroup are affected by this fact. What is
> important to note is: This issue is a blocker for using D as a teaching language
> at universities.

Unless you're teaching heavy-duty data I/O, even considerably slower I/O 
speed should not affect using a language for teaching. I'd be curious to 
hear more detail (and which university are you (considering) teaching D 
at?), thanks.

> std.stdio is completely compatible with cstdio functions, but that is both a
> benefit and a drawback:
> - cstdio can be used at no extra cost, very nice if you need it.
>
> - Phobos input functionality (mainly readf) is slowed down, since it cannot use an
> internal buffer.

That is correct. I know how to fix that on all supported OSs, but never 
got around to it. So much to do, sigh.

> -- This even applies when cstdio is not used at all!

Yah, I think that's not surprising.

> -- a large part of the inefficiency of readf may be caused by the range
> abstraction for files: std.stdio.LockingTextReader.empty looks like a bottleneck.

This is because a range essentially exposes a one-element buffer 
explicitly (via front()). Ironically, C's stdlib _also_ has a range of 
one element, but that's exposed in a very inefficient and indirect way: 
you can call getc() to destructively fetch one character, but you can 
then put it back with ungetc() AND ungetc() is GUARANTEED to succeed at 
least once. This means FILE* does have a one-character buffer even for 
unbuffered streams, which can be easily seen by analyzing the 
implementation of various stdlibs. Some actually offer a private 
function peek() that lets code "see" the next character in the stream. 
That would help the range interface. Currently 
std.stdio.LockingTextReader.empty calls (a variant of) getc, stores the 
character, and then calls (a variant of) ungetc(). The "variant of" is 
the unlocked version, so the code is already unportable. It's also slow, 
and we can fix it to be faster in unportable ways.

> -- C++ iostreams 'solves' it with ios::sync_with_cstdio(false);
>
> - Another more fundamental issue is that D IO cannot be atomic. There is no way to
> implement a function that leaves the input untouched if it is invalid, and still
> is compatible to cstdio.
>
> Eg:
> try a = read!int(); // oops, input is actually "abc"
> catch(...){s = read!string();} // get malformed input

That's a bug, read should leave unmatched characters in place.

> Currently in case of ill-formed input, formattedRead leaves the InputRange (which
> is real file input in the case of readf) in whatever position the error occured,
> and I'm not sure if this is even specified anywhere. It is almost useless for
> error handling.

Agreed.

> So, if D/Phobos basically forces usage of C functions then it's job would actually
> be to fix their usage. Otherwise, this is an open design issue.
>
> Any thoughts on how to improve the current situation? I think Phobos should get
> _input_ right eventually. (and output too, what is the state of the toString issue?)

I have all the knowledge (much of it shared above) but no time. If 
anyone would want to get on this, I'd be glad to answer detailed questions.


Andrei



More information about the Digitalmars-d mailing list