toString vs. toUtf8

Bill Baxter dnewsgroup at billbaxter.com
Mon Nov 19 19:08:02 PST 2007


Sean Kelly wrote:
> Steven Schveighoffer wrote:
>> "Sean Kelly" wrote
>>> What I discovered during a test conversion of Tango was that 
>>> converting all uses of toUtf8 to toString /except/ those intended to 
>>> perfom UTF conversions reduced code clarity, and left me unsure as to 
>>> which name I would actually use in a given situation.  For example, 
>>> there is quite a bit of code in the text and io packages which 
>>> convert an arbitrary type to a char[] for output, etc.  So by making 
>>> this change I was left with some conversions using toString and 
>>> others using toUtf8, toUtf16, and toUtf32, not to mention the fromXxx 
>>> versions of these same functions. As this is template code, the 
>>> choice between toString and toUtf8 in a given situation was unclear.
>>
>> Can you give an example file for this problem?  It would be easier to 
>> understand your problem if I knew exactly what you were talking 
>> about.  An actual example is fine, it doesn't need to be minimized 
>> (i.e. "take a look at tango/io/X.d")
> 
> tango.text.convert.Layout

I you are right that the meanings of toString and toUtf8 are subtly 
different.

My take is that toString promises to produce some textual form of the 
input (and it happens to use the utf8 encoding).  This transformation 
might be wildly lossy and non-reversible as is the case with the default 
implementation of toString for classes, which just prints the class 
name.  toUtf8 on the other hand, promises to do a conversion.  It's 
probably lossless, or nearly so, and since the encoding is mentioned 
specifically probably that's specifically a conversion between different 
string encodings.

The thing is some times A is B.  The best textual representation of a 
Utf32 string as Utf8 is going to be the Utf8 converted version of it. 
So in that case toString and toUtf8 happen to do the same thing.

So to me, the logical thing to do is to "alias toUtf8 toString;" in the 
cases where there's a converter that also suffices as a textual 
representation generator.  That way everything that can be represented 
as text has a toString method, and things that deal with encoding 
conversions have toUtf blah methods.

So in that case I don't see any reason for toWString, toDString. 
toString generates your canonical "textual representation" for whatever 
it is.  If you need that in a different encoding for whatever reason 
then you need to run an encoding converter on it.

--bb



More information about the Digitalmars-d mailing list