Checking function parameters in Phobos
Jonathan M Davis
jmdavisProg at gmx.com
Wed Nov 20 03:20:37 PST 2013
On Wednesday, November 20, 2013 11:45:57 Lars T. Kyllingstad wrote:
> On Wednesday, 20 November 2013 at 00:01:00 UTC, Andrei
>
> Alexandrescu wrote:
> > (c) A variety of text functions currently suffer because we
> > don't make the difference between validated UTF strings and
> > potentially invalid ones.
>
> I think it is fair to always assume that a char[] is a valid
> UTF-8 string, and instead perform the validation when
> creating/filling the string from a non-validated source.
That doesn't work when strings are being created via concatenation and the
like inside the program rather than simply coming from outside the program.
> Take std.file.read() as an example; it returns void[], but has a
> validating counterpart in std.file.readText().
>
> I think we should use ubyte[] to a greater extent for data which
> is potentially *not* valid UTF.
Well, we've already discussed the possibility of using ubyte[] to indicate
ASCII strings, and that makes a lot more sense IMHO, because then no decoding
occurs (which is precisely what you want for ASCII), whereas with a string
that's potentially invalid UTF, it's not that we don't want to decode it. It's
just that we need to validate it when decoding it.
So, I'd argue that ubyte[] should be used when you want to operate on code
units rather than code points rather than it having anything to do with
validating code points.
- Jonathan M Davis
More information about the Digitalmars-d
mailing list