Checking function parameters in Phobos

Wed Nov 20 03:20:37 PST 2013

On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote:
> On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei
> 
> Alexandrescu wrote:
> > (c) A variety of text functions currently suffer because we
> > don't make the difference between validated UTF strings and
> > potentially invalid ones.
> 
> I think it is fair to always assume that a char[] is a valid
> UTF-8 string, and instead perform the validation when
> creating/filling the string from a non-validated source.

That doesn't work when strings are being created via concatenation and the 
like inside the program rather than simply coming from outside the program.

> Take std.file.read() as an example; it returns void[], but has a
> validating counterpart in std.file.readText().
> 
> I think we should use ubyte[] to a greater extent for data which
> is potentially *not* valid UTF.

Well, we've already discussed the possibility of using ubyte[] to indicate 
ASCII strings, and that makes a lot more sense IMHO, because then no decoding 
occurs (which is precisely what you want for ASCII), whereas with a string 
that's potentially invalid UTF, it's not that we don't want to decode it. It's 
just that we need to validate it when decoding it.

So, I'd argue that ubyte[] should be used when you want to operate on code 
units rather than code points rather than it having anything to do with 
validating code points.

- Jonathan M Davis