ubyte vs. char for non-UTF-8 (was Re: toString vs. toUtf8)

Wed Nov 21 03:02:28 PST 2007

Matti Niemenmaa wrote:
> Regan Heath wrote:
> The point is that storing non-UTF data in ubyte/ushort/uint is a difficult task
> because even the C functions take char (or wchar_t, which I think is wchar on
> Windows and dchar elsewhere) and thus the code quickly becomes castville. cast
> here, cast there, everywhere a cast cast - and for no good reason.

Yeah, agreed 100%

> Thus I believe, as per my original proposal, that library functions be converted
> to use ubyte[] where they are not meant to accept char[]. This may or may not
> mean changes in std.string - it's up to the Phobos maintainers to make the
> choice as to whether a function will ever require UTF-8, and whether to type it
> as taking char[] or ubyte[]. In any case, at least the C functions should take
> ubyte[].

Agreed.  I would tend to leave the std.string functions taking char[] so 
that when they finally step up and have complete UTF compatibility their 
signatures do not change.  If we need some functions, like strip, as a 
stop gap for other encodings then I reckon we add them, perhaps to a 
different module, and we use ubyte* (or whatever) instead of char[] for 
the input parameter.

> The implicit casting from char-whatever to ubyte-whatever is useful when you
> want to call C functions with D strings. Once again the code would rapidly
> become castville if it would have to be done explicitly.

The only problem I have with implicit cast to ubyte-whatever is that I 
worry it will have an unexpected side effect somewhere...  Perhaps I am 
being alarmist.

Regan