toString vs. toUtf8

Mon Nov 19 19:14:57 PST 2007

Sean Kelly wrote:
> Christopher Wright wrote:
>> Sean Kelly wrote:
>>> I was looking at converting Tango's use of toUtf8 to toString today 
>>> and ran into a bit of a quandry....
>>
>> toUtf8 is ugly.
>> toString/toWString/toDString are opaque and ugly, hard to distinguish 
>> from each other.
>>
>> toString, toStringW, toStringD? Still ugly.
>>
>> toUtf, toUtf16, toUtf32? Slightly less clear, but easier to type.
>>
>> toString, toUtf16, toUtf32? Inconsistent, but readable, and it fits 
>> well with other conventions.
> 
> I tend to place a tremendous amount of value on consistency, because the 
> more consistent an API is, the more likely my guesses about it are to be 
> correct.  In my opinion, that precludes using the option you suggest.
> 
> In my opinion, Walter's suggestion that alternate encodings not be 
> stored in strings is sufficient reason to not bother with the encoding 
> format in the function name (ie. toUtf8/toUtf16/toUtf32).  I might 
> counter that I don't see any reason to lose meaning where it is so 
> easily provided, but on the other hand, I agree that new users are more 
> likely to know what a function named toString does than were it named 
> toUtf8.  These two points are a wash in my opinion.
> 
> The remaining concerns are less substantive.  I find toWString and 
> toDString difficult to read, but those feelings hold little more weight 
> than "toUtf8 is ugly."  I also feel that the term "string" is largely 
> meaningless in programming.  But I certainly couldn't win a debate with 
> either point.
> 
> I don't suppose there is anyone who does a lot of internationalization 
> programming who can comment on the utility of one convention vs. the 
> other?  I would love to hear some more practical concerns regarding the 
> naming convention for these functions.

My just formed opinion :-) is that any sort of toWstring/toDstring 
functions should be standalone things that only accept type "string" or 
"char" as input.  Yes there will be some performance penalty in some 
cases, but I don't think that's significant enough to warrant creating 
lots of functions that do exactly the same thing, just with different 
encodings.

--bb