Semantics of toString

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Nov 12 08:46:48 PST 2009


Steven Schveighoffer wrote:
> On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> Steven Schveighoffer wrote:
>>> On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu 
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> I think the best option for toString is to take an output range and 
>>>> write to it. (The sink is a simplified range.)
>>>  Bad idea...
>>>  A range only makes sense as a struct, not an interface/object.  I'll 
>>> tell you why: performance.
>>
>> You are right. If range interfaces accommodate block transfers, this 
>> problem may be addressed. I agree that one virtual call per character 
>> output would be overkill. (I seem to recall it's one of the reasons 
>> why C++'s iostreams are so inefficient.)
> 
> IIRC, I don't think C++ iostreams use polymorphism

Oh yes they do. (Did you even google?) Virtual multiple inheritance, the 
works.

http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/

>, and I don't think 
> they use the "one char at a time" method.

Well they do offer one char at a time and also a block transfer.

http://msdn.microsoft.com/en-us/library/760t8w1z%28VS.80%29.aspx

I'm not sure how the heck but they still manage to call one virtual 
method per char, otherwise they'd be plenty fast, which they aren't. I 
seem to recall write() has a default implementation that calls put() in 
a loop or something. It's not a topic that I want to study closely. 
iostreams suck, why spend time on learning the quirks of a broken design.

>>> Ranges are special in two respects:
>>>  1. They are foreachable.  I think everyone agrees that calling 2 
>>> interface functions per loop iteration is much lower performing than 
>>> using opApply, which calls one delegate function per loop.  My 
>>> recommendation -- use opApply when dealing with polymorphism.  I 
>>> don't think there's a way around this.
>>  >
>>> 2. They are useful for passing to std.algorithm.  But std.algorithm 
>>> is template-interfaced.  No need for using interfaces because the 
>>> correct instatiation will be chosen.
>>>  If you are intending to add a streaming module that uses ranges, 
>>> would it not be templated for the range type as std.algorithm is?  If 
>>> not, the next logical choice is a delegate, which requires no vtable 
>>> lookup.  Using an interface is just asking for a performance penalty 
>>> for not much gain.
>>
>> I think the cost of calling through the delegate is roughly the same 
>> as a virtual call.
> 
> Not exactly.  I think you are right that struct member calls are faster 
> than delegates, but only slightly.  The difference being that a struct 
> member call does not need to load the function address from the stack, 
> it can hard-code the address directly.
> 
> However, virtual calls have to be lower performing because you are doing 
> two indirections, one to the class vtable, then one to the function 
> address itself.  Plus those two locations are most likely located on the 
> heap, not the stack, and so may not be in the cache.

I think the only way to figure is to measure. For one thing I disagree 
with the comment about the cache - a vtable is quite likely to be warm 
after a couple of calls.

I know one thing - Walter's old format function used delegates and it 
was unusably slow.

>>> x.toString(outputRange, format)
>>>  and
>>>  x.toString(&outputRange.sink, format)
>>>  is pretty darn minimal, and if outputRange is an interface or 
>>> object, this saves a virtual call per buffer write.  Plus the second 
>>> form is more universal, you can pass any delegate, and not have to 
>>> use a range type to wrap a delegate.
>>>  Don't fall into the "OOP newbie" trap -- where just because you've 
>>> found a new concept that is amazing, you want to use it for 
>>> everything.  I say this because I've seen in the past where someone 
>>> discovers the power of OOP and then wants to use it for everything, 
>>> when in some cases, it's overkill.  Just look at some Java "classes"...
>>
>> There is no need to worry that I'll fall into at least that particular 
>> OOP newbie trap.
>>
>> What I think we should do is define a text output interface that 
>> allows writing individual characters of all widths and also arrays of 
>> all widths. That would be a universal means for text output.
>>
>> interface TextOutputStream {
>>      void put(dchar); // also accommodates char and wchar
>>      void put(in char[]);
>>      void put(in wchar[]);
>>      void put(in dchar[]);
>> }
>>
>> The toString method (re-baptized as toStream) would take such an 
>> interface. Better ideas are always welcome. Perhaps I'm falling 
>> another OOP newbie trap! (Seriously!)
> 
> This still fits within a single function, which takes one of the 3 
> widths (pick one, they can all be translated to eachother):
> 
> void put(in char[] str)
> {
>   foreach(dchar dc; str)
>   {
>      put((&dc)[0..1]);
>   }
> }
> 
> Note that you probably want to build a buffer of dchars instead of 
> putting one at a time, but you get the idea.

I don't get the idea. I'm seeing one virtual call per character.

> Also, putting a single character is probably pretty uncommon, but can be 
> handled in a similar fashion.

I'm not sure about the uncommonality of outputting one character, but it 
may be good to discourage it just to not foster slow code.

> That being said, one other point that makes all this moot is -- toString 
> is for debugging, not for general purpose.  We don't need to support 
> everything that is possible.  You should be able to say "hey, toString 
> only accepts char[], deal."  Of course, you could substitute wchar[] or 
> dchar[], but I think by far char[] is the most common (and is the 
> default type for string literals).

I was hoping we could elevate the usefulness of toString a bit.

> That's not to say there is no reason to have a TextOutputStream object.  
> Such a thing is perfectly usable for a toString which takes a char[] 
> delegate sink, just pass &put.  In fact, there could be a default 
> toString function in Object that does just that:
> 
> class Object
> {
>    ...
>    void toString(delegate void(in char[] buf) put, string fmt) const
>    {}
>    void toString(TextOutputStream tos, string fmt) const
>    { toString(&tos.put, fmt); }
> }

I'd agree with the delegate idea if we established that UTF-8 is favored 
compared to all other formats.


Andrei



More information about the Digitalmars-d mailing list