ubyte vs. char for non-UTF-8 (was Re: toString vs. toUtf8)
Regan Heath
regan at netmail.co.nz
Wed Nov 21 03:02:28 PST 2007
Matti Niemenmaa wrote:
> Regan Heath wrote:
> The point is that storing non-UTF data in ubyte/ushort/uint is a difficult task
> because even the C functions take char (or wchar_t, which I think is wchar on
> Windows and dchar elsewhere) and thus the code quickly becomes castville. cast
> here, cast there, everywhere a cast cast - and for no good reason.
Yeah, agreed 100%
> Thus I believe, as per my original proposal, that library functions be converted
> to use ubyte[] where they are not meant to accept char[]. This may or may not
> mean changes in std.string - it's up to the Phobos maintainers to make the
> choice as to whether a function will ever require UTF-8, and whether to type it
> as taking char[] or ubyte[]. In any case, at least the C functions should take
> ubyte[].
Agreed. I would tend to leave the std.string functions taking char[] so
that when they finally step up and have complete UTF compatibility their
signatures do not change. If we need some functions, like strip, as a
stop gap for other encodings then I reckon we add them, perhaps to a
different module, and we use ubyte* (or whatever) instead of char[] for
the input parameter.
> The implicit casting from char-whatever to ubyte-whatever is useful when you
> want to call C functions with D strings. Once again the code would rapidly
> become castville if it would have to be done explicitly.
The only problem I have with implicit cast to ubyte-whatever is that I
worry it will have an unexpected side effect somewhere... Perhaps I am
being alarmist.
Regan
More information about the Digitalmars-d
mailing list