toString vs. toUtf8
Gregor Richards
Richards at codu.org
Mon Nov 19 13:51:05 PST 2007
Walter Bright wrote:
> Phobos (and D) has undergone some evolution in the thinking about
> unicode strings, and it certainly has a few anachronisms in its names.
> But I think we've evolved to the point where going forward, we know what
> to do:
>
> char[] => string
> wchar[] => wstring
> dchar[] => dstring
>
> These are all unicode strings. Putting non-unicode encodings in them,
> even temporarily, should be discouraged. Non-unicode encodings should
> use ubyte[], ushort[], etc.
I believe that this naming convention would be best in Tango (toString,
toWString, toDString). Naming them toUtf8, toUtf16, toUtf32 not only
means that the coder has to understand what character encodings are
(which would be nice but shouldn't be necessary), but that the familiar
terminology "string" we take from literally every other language is
lost. If we have to define "strings" as being a bit more confined than
"arrays of bytes which presumably have some sort of form", even better.
Bytes encoding random-arsed character sets in to WTF-17 don't need to be
called "strings", they can be called WTF-17 arrays.
- Gregor Richards
More information about the Digitalmars-d
mailing list