Checking function parameters in Phobos

Lars T. Kyllingstad public at kyllingen.net
Wed Nov 20 02:45:57 PST 2013


On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei 
Alexandrescu wrote:
> (c) A variety of text functions currently suffer because we 
> don't make the difference between validated UTF strings and 
> potentially invalid ones.

I think it is fair to always assume that a char[] is a valid 
UTF-8 string, and instead perform the validation when 
creating/filling the string from a non-validated source.

Take std.file.read() as an example; it returns void[], but has a 
validating counterpart in std.file.readText().

I think we should use ubyte[] to a greater extent for data which 
is potentially *not* valid UTF.  Examples include interfacing 
with C functions, where I think there is a tendency towards 
always translating C char to D char, when they are in fact not 
equivalent.  Another example is, again, std.file.read(), which 
currently returns void[].  I guess it is a matter of taste, but I 
think ubyte[] would be more appropriate here, since you can 
actually use it for something without casting it first.

The transition from string to ubyte[] is already made simple by 
std.string.representation.  We should offer an equally simple and 
convenient way to do the opposite transformation.  In one of my 
current projects, I am using this function:

   inout(char)[] asString(inout(ubyte)[] data) @safe pure
   {
     auto s = cast(typeof(return)) data;
     import std.utf: validate;
     validate(s);
     return s;
   }

This could easily be written as a template, to accept wider 
encodings as well, and I think it would be a nice addition to 
Phobos.

Lars


More information about the Digitalmars-d mailing list