Semantics of toString

Tue Nov 10 10:53:38 PST 2009

Bill Baxter wrote:
> 2009/11/10 Denis Koroskin <2korden at gmail.com>:
>> On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam at nospam.com> wrote:
>>
>>> Bill Baxter wrote:
>>>> On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam at nospam.com> wrote:
>>>>> Lutger wrote:
>>>>>> Justin Johansson wrote:
>>>>>>
>>>>>>> Lutger Wrote:
>>>>>>>
>>>>>>>> Justin Johansson wrote:
>>>>>>>>
>>>>>>>>> I assert that the semantics of "toString" or similarly
>>>>>>>>> named/purposed
>>>>>>>>> methods/functions in many PL's (including and not limited to D) is
>>>>>>>>> ill-defined.
>>>>>>>>>
>>>>>>>>> To put this statement into perspective, I would be most appreciative
>>>>>>>>> of
>>>>>>>>> D NG readers responding with their own idea(s) of what the semantics
>>>>>>>>> of
>>>>>>>>> "toString" are (or should be) in a language agnostic ideology.
>>>>>>>>>
>>>>>>>> My other reply didn't take the language agnostic into account, sorry.
>>>>>>>>
>>>>>>>> Semantics of toString would depend on the object, I would think there
>>>>>>>> are
>>>>>>>> three general types of objects:
>>>>>>>>
>>>>>>>> 1. objects with only one sensible or one clear default string
>>>>>>>> representations, like integers. Maybe even none of these exist
>>>>>>>> (except
>>>>>>>> strings themselves?)
>>>>>>>>
>>>>>>>> 2. objects that, given some formatting options or locale have a clear
>>>>>>>> string representation. floating points, dates, curreny and the like.
>>>>>>>>
>>>>>>>> 3. objects that have no sensible default representation.
>>>>>>>>
>>>>>>>> toString() would not make sense for 3) type objects and only for 2)
>>>>>>>> type
>>>>>>>> objects as part of a formatting / localization package.
>>>>>>>>
>>>>>>>> toString() as a debugging aid sometimes doubles as a formatter for 1)
>>>>>>>> and
>>>>>>>> 2) class objects, but that may be more confusing than it's worth.
>>>>>>>>
>>>>>>> Thanks for that Lutger.
>>>>>>>
>>>>>>> Do you think it would make better sense if programming languages/their
>>>>>>> libraries separated functions/methods which are currently loosely
>>>>>>> purposed
>>>>>>> as "toString" into methods which are more specific to the types you
>>>>>>> suggest (leaving only the types/classifications and number thereof to
>>>>>>> argue about)?
>>>>>>>
>>>>>>> In my own D project, I've introduced a toDebugString method and left
>>>>>>> toString alone. There are times when I like D's default toString
>>>>>>> printing
>>>>>>> out the name of the object
>>>>>>> class.  For debug purposes there are times also when I like to see a
>>>>>>> string printed
>>>>>>> out in quotes so you can tell the difference between "123" and 123.
>>>>>>>  Then
>>>>>>> again, and since I'm working on a scripting language, sometimes I like
>>>>>>> to
>>>>>>> see debug output distinguish between different numeric types.
>>>>>>>
>>>>>>> Anyway going by the replies on this topic, looks like most people view
>>>>>>> toString as being good for debug purposes and that about it.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Justin
>>>>>>>
>>>>>> Your design makes better sense (to me at least) because it is based on
>>>>>> why
>>>>>> you want a string from some object.
>>>>>> Take .NET for example: it does provide very elaborate and nice
>>>>>> formatting
>>>>>> options based and toString() with parameters. For some types however,
>>>>>> the
>>>>>> default toString() gives you the name of the type itself which is in no
>>>>>> way
>>>>>> related to formatting an object. You learn to work with it, but I find
>>>>>> it a
>>>>>> bit muddled.
>>>>>> As a last note, I think people view toString as a debug thing mostly
>>>>>> because it is very underpowered.
>>>>> There is a definite use for such as thing. But the existing toString()
>>>>> is
>>>>> much, much worse than useless. People think you can do something with
>>>>> it,
>>>>> but you can't.
>>>>> eg, people have asked for BigInt to support toString(). That is an
>>>>> over-my-dead-body.
>>>>  You can definitely do something with it -- printf debugging.  And if I
>>>> were using BigInt, that's exactly why I'd want BigInt to have a
>>>> toString.
>>> I almost always want to print the value out in hex. And with some kind of
>>> digit separators, so that I can see how many digits it has.
>>>
>>>  Just out of curiousity, how does someone print out the
>>>> value of a BigInt right now?
>>> In Tango, there's just .toHex() and .toDecimalString(). Needs proper
>>> formatting options, it's the biggest thing which isn't done. I hit one too
>>> many compiler segfaults and starting patching the compiler instead <g>. But
>>> I really want a decent toString().
>>>
>>> Given a BigInt n, you should be able to just do
>>>
>>> writefln("%s %x", n, n);  // Phobos
>>> formatln("{0} {0:X}", n); // Tango
>>>
>>> To solve this part of the issue, it would be enough to have toString()
>>> take a string parameter. (it would be "x" or "X" in this case).
>>>
>>> string toString(string fmt);
>>> But the performance would still be very poor, and that's much more
>>> difficult to solve.
>> Yes, it would solve half of the toString problems.
>>
>> Another part (i.e. memory allocation) could be solved by providing an
>> optional buffer to the toString:
>>
>> char[] toString(string format = "s" /* comes from %s which is a default
>> qualifier */, char[] buffer = null)
>> {
>>    // operate on the buffer, possibly resizing it
>>    // which is safe and fast - it only allocates
>>    // when *really* necessary, instead of always, as now
>>    return buffer;
>> }
> 
> With Don's delegate idea, if you do have a toString with special
> performance concerns, then it can use its own stack-allocated buffer.
> 
> void toString(void delegate(const(char)[]) put, string format)
> {
>     char[512] preallocBuffer;
>     foreach( ... ) {
>            ...
>            put(preallocBuffer[0..lenUsed]);
>     }
> }

Thanks. 'put' is so much better than 'sink'. <g>

> If the buffer is going to be passed in, then probably it should be
> passed in as a full fledged output stream object with .write() methods
> and such.  I don't want to have to worry about buffer management to
> write a toString method.  That should be encapsulated.  But it seems
> to me that Don's method offers exactly the right minimality of
> interface to allow encapsulating that management without requiring it
> to be done in a heavy-handed way.

One thing it doesn't (easily) handle is the case where an int argument 
gives the length of another one. (eg the "%*s" writefln format). I guess 
this can still be handled (very inefficiently) by converting the 
parameter value into a text number -- generally, though, that'd only be 
for direct interchangability with a built-in type; you'd normally do 
such things by calling a member function on the struct.

The other issue is grauzone's comment: perhaps compile-time varargs make 
this whole approach obsolete.