toString vs. toUtf8

Robert Fraser fraserofthenight at gmail.com
Mon Nov 19 14:27:08 PST 2007


Gregor Richards wrote:
> Walter Bright wrote:
>> Phobos (and D) has undergone some evolution in the thinking about 
>> unicode strings, and it certainly has a few anachronisms in its names. 
>> But I think we've evolved to the point where going forward, we know 
>> what to do:
>>
>> char[] => string
>> wchar[] => wstring
>> dchar[] => dstring
>>
>> These are all unicode strings. Putting non-unicode encodings in them, 
>> even temporarily, should be discouraged. Non-unicode encodings should 
>> use ubyte[], ushort[], etc.
> 
> I believe that this naming convention would be best in Tango (toString, 
> toWString, toDString). Naming them toUtf8, toUtf16, toUtf32 not only 
> means that the coder has to understand what character encodings are 
> (which would be nice but shouldn't be necessary), but that the familiar 
> terminology "string" we take from literally every other language is 
> lost. If we have to define "strings" as being a bit more confined than 
> "arrays of bytes which presumably have some sort of form", even better. 
> Bytes encoding random-arsed character sets in to WTF-17 don't need to be 
> called "strings", they can be called WTF-17 arrays.
> 
>  - Gregor Richards

Agreed. It's also worth noting that toString as the name of a 
method/function has some precedence, so people familiar with Java, etc. 
will be able to get used to it right away, possibly without ever looking 
it up.



More information about the Digitalmars-d mailing list