Minor std.stdio.File.ByLine rant

Wed Feb 26 15:44:10 PST 2014

I'm writing a CLI program that uses File.ByLine to read input commands,
with optional prompting (if run in interactive mode). One would imagine
that this should be a natural use for ByLine (perhaps not as common
nowadays with the rampant GUI fanboyism, but it still happens in some
niches), but it is fraught with peril.

First of all, the way ByLine works is kinda tricky, even in the previous
releases. The underlying cause is that at least on Posix, the underlying
C feof() call doesn't actually tell you whether you're really at EOF
until you try to read something from the file descriptor. I know there
are good reasons for this, but this special percolates up the standard
library code and causes a problem with D's input range primitives, where
.empty must tell the caller, right now, whether data is available,
*before* .front ever returns anything.

At one time, this problem was worked around by issuing a single fgetc on
the underlying file descriptor in ByLine's .empty method to determine
its EOF state, and then doing a fungetc to put the char back into the
stream.  However, this code is a rather ugly hack, and causes the
problem that when the interactive program needs to output a prompt
before blocking on input, it has to do so *before* it calls ByLine.empty
(since otherwise .empty blocks and the prompt doesn't get printed until
after the user has hit Enter -- clearly unacceptable for an interactive
shell program). If the stream turns out empty after all, then the prompt
is already output, and there's no way to take it back, so an extraneous
prompt is always written.

Understandably, the fungetc hack was subsequently removed from Phobos,
by caching the subsequent line the first time .empty was called, which
eliminated the ugliness of fungetc, and allowed current code to continue
working as before.

Then recently, and also understandably, caching things in .empty was
frowned upon, so the caching was removed from .empty altogether and
pushed into the ByLine ctor. From the standpoint of Phobos code, this is
perhaps the ideal solution: the ctor reads the stream to get the first
line and simultaneously determine the EOF status of the stream, and
there is no need for ugly boolean state flags, fungetc ugliness, and
generally unpleasant code.

However, what happens is that now, ByLine will block on input *upon
construction*. This is rather unpleasant when your program needs to do
something like this:

	void main() {
		string prompt;
		...
		ByLine!char input;
		if (useStandardInput) {
			input = stdin.byLine();
		} else if (useScriptFile) {
			input = File(filename).byLine();
		}
		...
		if (mode == ProgramMode.modeA) { // mode is an enum
			runModeA(input);
		} else {
			runModeB(input);
		}
	}

	void runModeA(ByLine!char input) {
		write("modeA> ");	// display prompt
		while (!input.empty) {
			...
		}
	}

	void runModeB(ByLine!char input) {
		write("modeB> ");	// display prompt
		while (!input.empty) {
			...
		}
	}

The problem is, when input is initialized, we don't know what prompt to
use yet, but ByLine's ctor will already block when it tries to read from
stdin!

The current workaround I implemented is to use a wrapper around ByLine
that lazily constructs it when .empty is called.

Who knew something so simple as an interactive prompting program that
reads input lines could turn into such a nightmare when ByLine is used?

:-(

T

-- 
What is Matter, what is Mind? Never Mind, it doesn't Matter.