Checking function parameters in Phobos
Lars T. Kyllingstad
public at kyllingen.net
Wed Nov 20 02:45:57 PST 2013
On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei
Alexandrescu wrote:
> (c) A variety of text functions currently suffer because we
> don't make the difference between validated UTF strings and
> potentially invalid ones.
I think it is fair to always assume that a char[] is a valid
UTF-8 string, and instead perform the validation when
creating/filling the string from a non-validated source.
Take std.file.read() as an example; it returns void[], but has a
validating counterpart in std.file.readText().
I think we should use ubyte[] to a greater extent for data which
is potentially *not* valid UTF. Examples include interfacing
with C functions, where I think there is a tendency towards
always translating C char to D char, when they are in fact not
equivalent. Another example is, again, std.file.read(), which
currently returns void[]. I guess it is a matter of taste, but I
think ubyte[] would be more appropriate here, since you can
actually use it for something without casting it first.
The transition from string to ubyte[] is already made simple by
std.string.representation. We should offer an equally simple and
convenient way to do the opposite transformation. In one of my
current projects, I am using this function:
inout(char)[] asString(inout(ubyte)[] data) @safe pure
{
auto s = cast(typeof(return)) data;
import std.utf: validate;
validate(s);
return s;
}
This could easily be written as a template, to accept wider
encodings as well, and I think it would be a nice addition to
Phobos.
Lars
More information about the Digitalmars-d
mailing list