Semantics of toString

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Nov 12 10:46:08 PST 2009


Steven Schveighoffer wrote:
> On Thu, 12 Nov 2009 11:46:48 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>> On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> Steven Schveighoffer wrote:
>>>>> On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
>>>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>>>
>>>>>> I think the best option for toString is to take an output range 
>>>>>> and write to it. (The sink is a simplified range.)
>>>>>  Bad idea...
>>>>>  A range only makes sense as a struct, not an interface/object.  
>>>>> I'll tell you why: performance.
>>>>
>>>> You are right. If range interfaces accommodate block transfers, this 
>>>> problem may be addressed. I agree that one virtual call per 
>>>> character output would be overkill. (I seem to recall it's one of 
>>>> the reasons why C++'s iostreams are so inefficient.)
>>>  IIRC, I don't think C++ iostreams use polymorphism
>>
>> Oh yes they do. (Did you even google?) Virtual multiple inheritance, 
>> the works.
>>
>> http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/ 
>>
> 
>  From my C++ book, it appears to only use virtual inheritance.  I don't 
> know enough about virtual inheritance to know how that changes function 
> calls.
> 
> As far as virtual functions, only the destructor is virtual, so there is 
> no issue there.

You're right, but there is an issue because as far as I can recall these 
functions' implementation do end up calling a virtual function per char; 
that might be streambuf.overflow. I'm not keen on investigating this any 
further, but I'd be grateful if you shared any related knowledge. At the 
end of the day, there seem to be violent agreement that we don't want 
one virtual call per character or one delegate call per character.

>>>  void put(in char[] str)
>>> {
>>>   foreach(dchar dc; str)
>>>   {
>>>      put((&dc)[0..1]);
>>>   }
>>> }
>>>  Note that you probably want to build a buffer of dchars instead of 
>>> putting one at a time, but you get the idea.
>>
>> I don't get the idea. I'm seeing one virtual call per character.
> 
> You missed the note.  I didn't implement it, but you could easily 
> implement a stack-allocated buffer to cache the conversions, passing 
> multiple converted code-points at once.  But I don't think it's even 
> worth discussing per my other points.
> 
>>> That being said, one other point that makes all this moot is -- 
>>> toString is for debugging, not for general purpose.  We don't need to 
>>> support everything that is possible.  You should be able to say "hey, 
>>> toString only accepts char[], deal."  Of course, you could substitute 
>>> wchar[] or dchar[], but I think by far char[] is the most common (and 
>>> is the default type for string literals).
>>
>> I was hoping we could elevate the usefulness of toString a bit.
> 
> Whatever kind of data the output stream gets, it's going to convert it 
> to the format it wants anyways (as for stdout, I think that would be 
> utf8), the only benefit is if you have data stored in a different width 
> that you wanted to output.  Calling a conversion function in that case I 
> think is reasonable enough, and saves the output stream from having to 
> convert/deal with it.
> 
> In other words, I don't think it's going to be that common a case where 
> you need anything other than utf8 output, and therefore the cost of 
> creating an interface, making virtual calls, disallowing simple delegate 
> passing etc is worth the convenience *just in case* you have data stored 
> as wchar[] you want to output.

I'm not sure.

http://www.gnu.org/s/libc/manual/html_node/Streams-and-I18N.html#Streams-and-I18N

gnu defines means to set and detect a utf-16 console, which dmd observes 
(grep std/ for fwide). But then I'm not sure how many are using that 
kind of stuff.

>>> That's not to say there is no reason to have a TextOutputStream 
>>> object.  Such a thing is perfectly usable for a toString which takes 
>>> a char[] delegate sink, just pass &put.  In fact, there could be a 
>>> default toString function in Object that does just that:
>>>  class Object
>>> {
>>>    ...
>>>    void toString(delegate void(in char[] buf) put, string fmt) const
>>>    {}
>>>    void toString(TextOutputStream tos, string fmt) const
>>>    { toString(&tos.put, fmt); }
>>> }
>>
>> I'd agree with the delegate idea if we established that UTF-8 is 
>> favored compared to all other formats.
> 
> D seems to favor UTF8 -- it is the default type for string literals.  I 
> don't think I've ever used dchar, and I usually only use wchar to talk 
> to Win32 functions when required.
> 
> The question I'd ask is -- how common is it where the versions other 
> than char[] would be more convenient?

I don't know. I think Asian-language users might give a salient answer.


Andrei



More information about the Digitalmars-d mailing list