toString vs. toUtf8

Gregor Richards Richards at codu.org
Mon Nov 19 13:51:05 PST 2007


Walter Bright wrote:
> Phobos (and D) has undergone some evolution in the thinking about 
> unicode strings, and it certainly has a few anachronisms in its names. 
> But I think we've evolved to the point where going forward, we know what 
> to do:
> 
> char[] => string
> wchar[] => wstring
> dchar[] => dstring
> 
> These are all unicode strings. Putting non-unicode encodings in them, 
> even temporarily, should be discouraged. Non-unicode encodings should 
> use ubyte[], ushort[], etc.

I believe that this naming convention would be best in Tango (toString, 
toWString, toDString). Naming them toUtf8, toUtf16, toUtf32 not only 
means that the coder has to understand what character encodings are 
(which would be nice but shouldn't be necessary), but that the familiar 
terminology "string" we take from literally every other language is 
lost. If we have to define "strings" as being a bit more confined than 
"arrays of bytes which presumably have some sort of form", even better. 
Bytes encoding random-arsed character sets in to WTF-17 don't need to be 
called "strings", they can be called WTF-17 arrays.

  - Gregor Richards



More information about the Digitalmars-d mailing list