Ranges of char and wchar

Thu May 8 11:19:58 PDT 2014

On Thu, May 08, 2014 at 10:46:12AM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> A discussion is building around
> https://github.com/D-Programming-Language/phobos/pull/2149, which is a
> nice initiative by Walter to allow Phobos users to avoid or control
> memory allocation.
> 
> First instance of the pull request copied the inputs into an output
> range.
> 
> The second instance (right now) creates an input range that lazily
> creates the result.

I've thought about input ranges vs. output ranges for a bit.  I think it
doesn't make sense for functions that process data to take an output
range: output ranges are data sinks, and should only be used for the
endpoint of a data processing pipeline. Since the string function
doesn't know whether or not it's the last in a pipeline (only the
calling code can know this), it should return an input range. If the
user code wants to put the result into an output range, then it should
simply use std.algorithm.copy.

This way, you maximize the usability of the function -- it can
participate in UFCS chains, compose with other std.algorithm functions,
etc..

[...]
> We need a robust idiom for doing such string manipulation without
> allocation, for which setExtension is just an example. Going the
> output range route has nice things going for it because the output
> range decides the encoding in advance and then accepts via put() calls
> any encoding, with only the minimum transcoding needed.

The problem with this approach is that it hampers usage in UFCS
pipelines.

> However output range means the string operation will be done eagerly,
> whereas lazy has advantages (nice piping, saving on work etc).
> 
> On the other hand, there's the risk of becoming "more catholic than
> the Pope" by insisting on lazy string processing. Most string
> operations are eager, and insisting on a general framework for lazy
> encoded operations on strings may be an exaggeration.
[...]

In terms of usability, my opinion is that it makes most sense to return
an input range. Let the user decide when the result should be copied
into an output range (via std.algorithm.copy).

Compare the following for constructing a path from a directory name, a
filename, and an extension:

Case 1: setExtension takes an output range:

	// Look how ugly this is:
	string dirname = ...;
	string filename = ...;

	// Need temp buffer to store result
	char[128] result;
	char[] outputRange = result[];

	dirname.copy(outputRange);
	setExtension(filename, ".ext", outputRange);

	writeln(result);

Case 2: setExtension takes an input range:

	// Look how clean this is:
	string dirname = ...;
	string filename = ...;

	writeln(chain(dirname, setExtension(filename, ".ext")));

In case 1, the user has to manually create various intermediate buffers
to store intermediate results. I used a trivial example here, but in
application code, the processing you need is usually far more complex.
This means creating lots of intermediate buffers, making sure you link
the right ones together, etc.. The code becomes very verbose, and
becomes a maintenance nightmare (which of the tmp1, tmp2, tmp3 buffers
refer to which fragment of the result again? Oh oops, I think I passed
the wrong output range to setExtension).

In case 2, the user decides when a buffer is needed and when it's not.
The function calls chain very nicely. The code is more readable, and
easy to maintain (and needless allocations -- including temporary static
buffers -- are eliminated).

T

-- 
Nearly all men can stand adversity, but if you want to test a man's character, give him power. -- Abraham Lincoln