Semantics of toString

Thu Nov 12 13:54:13 PST 2009

Steven Schveighoffer wrote:
> On Thu, 12 Nov 2009 16:19:39 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>> On Thu, 12 Nov 2009 14:40:12 -0500, Andrei Alexandrescu 
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> Steven Schveighoffer wrote:
>>>>> On Thu, 12 Nov 2009 13:46:08 -0500, Andrei Alexandrescu 
>>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>>
>>>>>>>   From my C++ book, it appears to only use virtual inheritance.  
>>>>>>> I don't know enough about virtual inheritance to know how that 
>>>>>>> changes function calls.
>>>>>>>  As far as virtual functions, only the destructor is virtual, so 
>>>>>>> there is no issue there.
>>>>>>
>>>>>> You're right, but there is an issue because as far as I can recall 
>>>>>> these functions' implementation do end up calling a virtual 
>>>>>> function per char; that might be streambuf.overflow. I'm not keen 
>>>>>> on investigating this any further, but I'd be grateful if you 
>>>>>> shared any related knowledge.
>>>>>  Yep, you are right.  It appears the reason they do this is so the 
>>>>> conversion to the appropriate width can be done per character (and 
>>>>> is a no-op for char).
>>>>>
>>>>>> At the end of the day, there seem to be violent agreement that we 
>>>>>> don't want one virtual call per character or one delegate call per 
>>>>>> character.
>>>>>  After running my tests, it appears the virtual call vs. delegate 
>>>>> is so negligible, and the virtual call vs. direct call is only 
>>>>> slightly less negligible, I think the virtualness may not matter.  
>>>>> However, I think avoiding one *call* per character is a worthy goal.
>>>>>  This doesn't mean I change my mind :)  I still think there is 
>>>>> little benefit to having to conjure up an entire object just to 
>>>>> convert something to a string vs. writing a simple inner function.
>>>>>  One way to find out is to support only char[], and see who 
>>>>> complains :)  It'd be much easier to go from supporting char[] to 
>>>>> supporting all the widths than going from supporting all to just one.
>>>>
>>>> One problem I just realized is that, if we e.g. offer only put(in 
>>>> char[]) or a delegate to that effect, we make it impossible to 
>>>> output one character efficiently. The (&c)[0 .. 1] trick will not 
>>>> work in safe mode. You'd have to allocate a one-element array 
>>>> dynamically.
>>>  char[1] buf;
>>> buf[0] = c;
>>> put(buf);
>>
>> This would not compile in SafeD.
> 
> :O
> 
> Why not?  I would expect that using a local buffer would be the main way 
> for converting non-string things to strings, or to avoid calling the 
> delegate/vfunction lots of times.

Well a stack-allocated buffer is stack-allocated, and passing a slice 
out of it to a function may cause the function to escape the slice.

> i.e. if I want to output an integer i:
> 
> 
> if(i == 0) put("0");
> else
> {
>   char[20] buf;
>   int idx = buf.length - 1;
>   while(i != 0)
>   {
>     buf[idx] = i % 10;
>     --idx;
>     i /= 10;
>   }
>   put(buf[idx..$]); // no compily in SafeD???
> }
> 
> Do I have to allocate a heap buffer in SafeD?

I'm afraid so. Unless of course you have a put(dchar) routine handy :o).

>>>> Also, many OSs adopted UTF-16 as their standard format. It may be 
>>>> wise to design for compatibility.
>>>  So you want toString's to look like this?
>>>  version(utf16isdefault)
>>> {
>>>   textobj.put("Array: "w);
>>>   ...
>>> }
>>> else
>>> {
>>>   textobj.put("Array: ");
>>>   ...
>>> }
>>>  -Steve
>>
>>
>> I was just thinking of offering an interface that offers utf8 and 
>> utf16 and utf32.
> 
> Yes, and your explaination for this is because many OSes adopt UTF-16 as 
> their standard format.  My expectation is that the outputter will 
> convert to the required OS format anyways, regardless of what you pass 
> it, so why should we write code to cater to what the OS wants?  I'd like 
> to write string-handling code once and be done with it, not try to 
> optimize my toString functions so that they use the "right" methods for 
> the current OS.  I asserted that the only reason you want to use the 
> functions other than the char[] version is in the case where your data 
> is *stored* as wchar[] or dchar[].  Otherwise, it makes no sense to do 
> the conversion because the outputter already does it for you.  So the 
> question becomes, how often do you need to output data that's already in 
> dchar[] or wchar[] format, and is it worth passing around a list of 
> functions just in case you need that, or should you just call a 
> conversion routine the few times you need it?
> 
> Let's not forget that this is mainly for debugging...

If it's mainly for debugging maybe it's not worth spending time on.

Andrei