Semantics of toString

Tue Nov 10 09:20:16 PST 2009

On Tue, 10 Nov 2009 15:30:20 +0300, Don <nospam at nospam.com> wrote:

> Bill Baxter wrote:
>> On Tue, Nov 10, 2009 at 2:51 AM, Don <nospam at nospam.com> wrote:
>>> Lutger wrote:
>>>> Justin Johansson wrote:
>>>>
>>>>> Lutger Wrote:
>>>>>
>>>>>> Justin Johansson wrote:
>>>>>>
>>>>>>> I assert that the semantics of "toString" or similarly  
>>>>>>> named/purposed
>>>>>>> methods/functions in many PL's (including and not limited to D) is
>>>>>>> ill-defined.
>>>>>>>
>>>>>>> To put this statement into perspective, I would be most  
>>>>>>> appreciative of
>>>>>>> D NG readers responding with their own idea(s) of what the  
>>>>>>> semantics of
>>>>>>> "toString" are (or should be) in a language agnostic ideology.
>>>>>>>
>>>>>> My other reply didn't take the language agnostic into account,  
>>>>>> sorry.
>>>>>>
>>>>>> Semantics of toString would depend on the object, I would think  
>>>>>> there
>>>>>> are
>>>>>> three general types of objects:
>>>>>>
>>>>>> 1. objects with only one sensible or one clear default string
>>>>>> representations, like integers. Maybe even none of these exist  
>>>>>> (except
>>>>>> strings themselves?)
>>>>>>
>>>>>> 2. objects that, given some formatting options or locale have a  
>>>>>> clear
>>>>>> string representation. floating points, dates, curreny and the like.
>>>>>>
>>>>>> 3. objects that have no sensible default representation.
>>>>>>
>>>>>> toString() would not make sense for 3) type objects and only for 2)  
>>>>>> type
>>>>>> objects as part of a formatting / localization package.
>>>>>>
>>>>>> toString() as a debugging aid sometimes doubles as a formatter for  
>>>>>> 1)
>>>>>> and
>>>>>> 2) class objects, but that may be more confusing than it's worth.
>>>>>>
>>>>> Thanks for that Lutger.
>>>>>
>>>>> Do you think it would make better sense if programming  
>>>>> languages/their
>>>>> libraries separated functions/methods which are currently loosely
>>>>> purposed
>>>>> as "toString" into methods which are more specific to the types you
>>>>> suggest (leaving only the types/classifications and number thereof to
>>>>> argue about)?
>>>>>
>>>>> In my own D project, I've introduced a toDebugString method and left
>>>>> toString alone. There are times when I like D's default toString  
>>>>> printing
>>>>> out the name of the object
>>>>> class.  For debug purposes there are times also when I like to see a
>>>>> string printed
>>>>> out in quotes so you can tell the difference between "123" and 123.   
>>>>> Then
>>>>> again, and since I'm working on a scripting language, sometimes I  
>>>>> like to
>>>>> see debug output distinguish between different numeric types.
>>>>>
>>>>> Anyway going by the replies on this topic, looks like most people  
>>>>> view
>>>>> toString as being good for debug purposes and that about it.
>>>>>
>>>>> Cheers
>>>>> Justin
>>>>>
>>>> Your design makes better sense (to me at least) because it is based  
>>>> on why
>>>> you want a string from some object.
>>>> Take .NET for example: it does provide very elaborate and nice  
>>>> formatting
>>>> options based and toString() with parameters. For some types however,  
>>>> the
>>>> default toString() gives you the name of the type itself which is in  
>>>> no way
>>>> related to formatting an object. You learn to work with it, but I  
>>>> find it a
>>>> bit muddled.
>>>> As a last note, I think people view toString as a debug thing mostly
>>>> because it is very underpowered.
>>> There is a definite use for such as thing. But the existing toString()  
>>> is
>>> much, much worse than useless. People think you can do something with  
>>> it,
>>> but you can't.
>>> eg, people have asked for BigInt to support toString(). That is an
>>> over-my-dead-body.
>>  You can definitely do something with it -- printf debugging.  And if I
>> were using BigInt, that's exactly why I'd want BigInt to have a
>> toString.
>
> I almost always want to print the value out in hex. And with some kind  
> of digit separators, so that I can see how many digits it has.
>
>   Just out of curiousity, how does someone print out the
>> value of a BigInt right now?
>
> In Tango, there's just .toHex() and .toDecimalString(). Needs proper  
> formatting options, it's the biggest thing which isn't done. I hit one  
> too many compiler segfaults and starting patching the compiler instead  
> <g>. But I really want a decent toString().
>
> Given a BigInt n, you should be able to just do
>
> writefln("%s %x", n, n);  // Phobos
> formatln("{0} {0:X}", n); // Tango
>
> To solve this part of the issue, it would be enough to have toString()  
> take a string parameter. (it would be "x" or "X" in this case).
>
> string toString(string fmt);
> But the performance would still be very poor, and that's much more  
> difficult to solve.

Yes, it would solve half of the toString problems.

Another part (i.e. memory allocation) could be solved by providing an  
optional buffer to the toString:

char[] toString(string format = "s" /* comes from %s which is a default  
qualifier */, char[] buffer = null)
{
     // operate on the buffer, possibly resizing it
     // which is safe and fast - it only allocates
     // when *really* necessary, instead of always, as now
     return buffer;
}

You can use it almost the same way you used it before:

string s = assumeUnique(someObject.toString()); // because we return a  
mutable string now

Optimization example:

int sprintf(string format, ...)
{
     char[512] preallocatedBuffer;
     char[] buffer = preallocatedBuffer[]; // buffer may grow, but
     // initially points to a preallocatedBuffer

     char[] storage = buffer[]; // storage for a current element

     ...
     for (...) { // iterate over qualifiers (and arguments)
         string currentQualifier = format[i..j];
         auto currentArgument = argsTuple[n];

         char[] result = currentArgument.toString(storage);
         if (result.ptr is storage.ptr) {
             // okay, string was constructed in-place
             storage = storage[result.length..$];
         } else {
             // storage didn't have enough space for the whole
             // string (a reallocation occurred)

             int offset = buffer.length - storage.length;

             // increase the capacity
             buffer.length *= 2;

             // append our string to the buffer
	    buffer[offset..offset+storage.length] = storage[];

             // renew the temporary storage
             storage = preallocatedBuffer[];
         }
     }
     ...
}

Another example:

class Array(T)
{
     // ...
     private T[] elements;

     char[] toString(string format, char[] buffer) {
         auto builder = StringBuilder(buffer); // reallocates when no space  
left
         builder.append("[");
         foreach (i, o; elements) {
             if (i > 0) builder.append(", "); // separator

             buffer = builder.getBuffer()[appender.length..$];
             char[] result = o.toString(format, buffer);
             if (result.ptr is buffer.ptr) {
                 // no reallocation
                 builder.length += result.length; // without copying
             } else {
                 builder.append(result);
             }
         }

         builder.append("]");

         return builder.toString();
     }
}

auto array = new Array!(int);
array ~= [0, 1, 2, 3, 4];
assert(array.toString() == "[0, 1, 2, 3, 4]");

It's not very easy to take advantage of, but it's usable the old way  
(well, almost).

Any ideas?