[Dlang-study] [rcstring] Defining rcstring

Wed Feb 3 02:52:22 PST 2016

On Wednesday, February 03, 2016 06:46:53 Михаил Страшун wrote:
> Element type of `byCodeUnit` should be `ubyte` in my opinion so that it
> becomes clear each separate element is not a valid char on its own.

By definition, char is a UTF-8 code unit, wchar is a UTF-16 code unit, and
dchar is a UTF-32 code unit, and so code is supposed to be able to assume
that. So, I don't see why it would make sense to use ubyte for a code unit.
We already have types which are explictly for code units.

Now, by that same token, having the I/O stuff use ubyte rather than char (as
you suggested elsewhere in your post) does make a lot of sense precisely
because there's no guarantee that what's read in is actually in UTF-8, and
any code where it's not sure really should be using ubyte, ushort, or ulong
instead of char, wchar, or dchar. Having the I/O functions assume UTF-8 was
definitely a mistake IMHO, much as it usually works. But the strings
themselves are supposed to be UTF-8, UTF-16, or UTF-32. So, IMHO, RCString
should be operating on chars, wchars, or dchars and not ubytes, ushorts, or
ulongs.

- Jonathan M Davis