toString vs. toUtf8

Mon Nov 19 14:27:08 PST 2007

Gregor Richards wrote:
> Walter Bright wrote:
>> Phobos (and D) has undergone some evolution in the thinking about 
>> unicode strings, and it certainly has a few anachronisms in its names. 
>> But I think we've evolved to the point where going forward, we know 
>> what to do:
>>
>> char[] => string
>> wchar[] => wstring
>> dchar[] => dstring
>>
>> These are all unicode strings. Putting non-unicode encodings in them, 
>> even temporarily, should be discouraged. Non-unicode encodings should 
>> use ubyte[], ushort[], etc.
> 
> I believe that this naming convention would be best in Tango (toString, 
> toWString, toDString). Naming them toUtf8, toUtf16, toUtf32 not only 
> means that the coder has to understand what character encodings are 
> (which would be nice but shouldn't be necessary), but that the familiar 
> terminology "string" we take from literally every other language is 
> lost. If we have to define "strings" as being a bit more confined than 
> "arrays of bytes which presumably have some sort of form", even better. 
> Bytes encoding random-arsed character sets in to WTF-17 don't need to be 
> called "strings", they can be called WTF-17 arrays.
> 
>  - Gregor Richards

Agreed. It's also worth noting that toString as the name of a 
method/function has some precedence, so people familiar with Java, etc. 
will be able to get used to it right away, possibly without ever looking 
it up.