Semantics of toString

Tue Nov 10 10:32:32 PST 2009

2009/11/10 Denis Koroskin <2korden at gmail.com>:
> On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam at nospam.com> wrote:
>
>> Bill Baxter wrote:
>>>
>>> On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam at nospam.com> wrote:
>>>>
>>>> Lutger wrote:
>>>>>
>>>>> Justin Johansson wrote:
>>>>>
>>>>>> Lutger Wrote:
>>>>>>
>>>>>>> Justin Johansson wrote:
>>>>>>>
>>>>>>>> I assert that the semantics of "toString" or similarly
>>>>>>>> named/purposed
>>>>>>>> methods/functions in many PL's (including and not limited to D) is
>>>>>>>> ill-defined.
>>>>>>>>
>>>>>>>> To put this statement into perspective, I would be most appreciative
>>>>>>>> of
>>>>>>>> D NG readers responding with their own idea(s) of what the semantics
>>>>>>>> of
>>>>>>>> "toString" are (or should be) in a language agnostic ideology.
>>>>>>>>
>>>>>>> My other reply didn't take the language agnostic into account, sorry.
>>>>>>>
>>>>>>> Semantics of toString would depend on the object, I would think there
>>>>>>> are
>>>>>>> three general types of objects:
>>>>>>>
>>>>>>> 1. objects with only one sensible or one clear default string
>>>>>>> representations, like integers. Maybe even none of these exist
>>>>>>> (except
>>>>>>> strings themselves?)
>>>>>>>
>>>>>>> 2. objects that, given some formatting options or locale have a clear
>>>>>>> string representation. floating points, dates, curreny and the like.
>>>>>>>
>>>>>>> 3. objects that have no sensible default representation.
>>>>>>>
>>>>>>> toString() would not make sense for 3) type objects and only for 2)
>>>>>>> type
>>>>>>> objects as part of a formatting / localization package.
>>>>>>>
>>>>>>> toString() as a debugging aid sometimes doubles as a formatter for 1)
>>>>>>> and
>>>>>>> 2) class objects, but that may be more confusing than it's worth.
>>>>>>>
>>>>>> Thanks for that Lutger.
>>>>>>
>>>>>> Do you think it would make better sense if programming languages/their
>>>>>> libraries separated functions/methods which are currently loosely
>>>>>> purposed
>>>>>> as "toString" into methods which are more specific to the types you
>>>>>> suggest (leaving only the types/classifications and number thereof to
>>>>>> argue about)?
>>>>>>
>>>>>> In my own D project, I've introduced a toDebugString method and left
>>>>>> toString alone. There are times when I like D's default toString
>>>>>> printing
>>>>>> out the name of the object
>>>>>> class.  For debug purposes there are times also when I like to see a
>>>>>> string printed
>>>>>> out in quotes so you can tell the difference between "123" and 123.
>>>>>>  Then
>>>>>> again, and since I'm working on a scripting language, sometimes I like
>>>>>> to
>>>>>> see debug output distinguish between different numeric types.
>>>>>>
>>>>>> Anyway going by the replies on this topic, looks like most people view
>>>>>> toString as being good for debug purposes and that about it.
>>>>>>
>>>>>> Cheers
>>>>>> Justin
>>>>>>
>>>>> Your design makes better sense (to me at least) because it is based on
>>>>> why
>>>>> you want a string from some object.
>>>>> Take .NET for example: it does provide very elaborate and nice
>>>>> formatting
>>>>> options based and toString() with parameters. For some types however,
>>>>> the
>>>>> default toString() gives you the name of the type itself which is in no
>>>>> way
>>>>> related to formatting an object. You learn to work with it, but I find
>>>>> it a
>>>>> bit muddled.
>>>>> As a last note, I think people view toString as a debug thing mostly
>>>>> because it is very underpowered.
>>>>
>>>> There is a definite use for such as thing. But the existing toString()
>>>> is
>>>> much, much worse than useless. People think you can do something with
>>>> it,
>>>> but you can't.
>>>> eg, people have asked for BigInt to support toString(). That is an
>>>> over-my-dead-body.
>>>
>>>  You can definitely do something with it -- printf debugging.  And if I
>>> were using BigInt, that's exactly why I'd want BigInt to have a
>>> toString.
>>
>> I almost always want to print the value out in hex. And with some kind of
>> digit separators, so that I can see how many digits it has.
>>
>>  Just out of curiousity, how does someone print out the
>>>
>>> value of a BigInt right now?
>>
>> In Tango, there's just .toHex() and .toDecimalString(). Needs proper
>> formatting options, it's the biggest thing which isn't done. I hit one too
>> many compiler segfaults and starting patching the compiler instead <g>. But
>> I really want a decent toString().
>>
>> Given a BigInt n, you should be able to just do
>>
>> writefln("%s %x", n, n);  // Phobos
>> formatln("{0} {0:X}", n); // Tango
>>
>> To solve this part of the issue, it would be enough to have toString()
>> take a string parameter. (it would be "x" or "X" in this case).
>>
>> string toString(string fmt);
>> But the performance would still be very poor, and that's much more
>> difficult to solve.
>
> Yes, it would solve half of the toString problems.
>
> Another part (i.e. memory allocation) could be solved by providing an
> optional buffer to the toString:
>
> char[] toString(string format = "s" /* comes from %s which is a default
> qualifier */, char[] buffer = null)
> {
>    // operate on the buffer, possibly resizing it
>    // which is safe and fast - it only allocates
>    // when *really* necessary, instead of always, as now
>    return buffer;
> }

With Don's delegate idea, if you do have a toString with special
performance concerns, then it can use its own stack-allocated buffer.

void toString(void delegate(const(char)[]) put, string format)
{
    char[512] preallocBuffer;
    foreach( ... ) {
           ...
           put(preallocBuffer[0..lenUsed]);
    }
}

Which in some cases (like writefln) should be almost as efficient as
passing a buffer in.  It avoids willy-nilly unbounded allocations
anyway.
But the nice thing is that it's easy to upgrade to.  You can keep it
simple and leave toString pretty much like you had it before, just
changing the signature and the return.

void toString(void delegate(const(char)[]) put, string format)
{
    char ret[];
    foreach( ... ) {
           ...
           ret ~= "...";
    }
    put(ret);  // only this line needed to change for Don-style toString
}

And to get the string you just need to call format:

    assert(std.string.format(thing) == "blah");

If the buffer is going to be passed in, then probably it should be
passed in as a full fledged output stream object with .write() methods
and such.  I don't want to have to worry about buffer management to
write a toString method.  That should be encapsulated.  But it seems
to me that Don's method offers exactly the right minimality of
interface to allow encapsulating that management without requiring it
to be done in a heavy-handed way.

--bb