Semantics of toString

Thu Nov 12 08:14:56 PST 2009

On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> Steven Schveighoffer wrote:
>> On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu  
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>
>>> I think the best option for toString is to take an output range and  
>>> write to it. (The sink is a simplified range.)
>>  Bad idea...
>>  A range only makes sense as a struct, not an interface/object.  I'll  
>> tell you why: performance.
>
> You are right. If range interfaces accommodate block transfers, this  
> problem may be addressed. I agree that one virtual call per character  
> output would be overkill. (I seem to recall it's one of the reasons why  
> C++'s iostreams are so inefficient.)

IIRC, I don't think C++ iostreams use polymorphism, and I don't think they  
use the "one char at a time" method.

>
>> Ranges are special in two respects:
>>  1. They are foreachable.  I think everyone agrees that calling 2  
>> interface functions per loop iteration is much lower performing than  
>> using opApply, which calls one delegate function per loop.  My  
>> recommendation -- use opApply when dealing with polymorphism.  I don't  
>> think there's a way around this.
>  >
>> 2. They are useful for passing to std.algorithm.  But std.algorithm is  
>> template-interfaced.  No need for using interfaces because the correct  
>> instatiation will be chosen.
>>  If you are intending to add a streaming module that uses ranges, would  
>> it not be templated for the range type as std.algorithm is?  If not,  
>> the next logical choice is a delegate, which requires no vtable  
>> lookup.  Using an interface is just asking for a performance penalty  
>> for not much gain.
>
> I think the cost of calling through the delegate is roughly the same as  
> a virtual call.

Not exactly.  I think you are right that struct member calls are faster  
than delegates, but only slightly.  The difference being that a struct  
member call does not need to load the function address from the stack, it  
can hard-code the address directly.

However, virtual calls have to be lower performing because you are doing  
two indirections, one to the class vtable, then one to the function  
address itself.  Plus those two locations are most likely located on the  
heap, not the stack, and so may not be in the cache.

>
>> x.toString(outputRange, format)
>>  and
>>  x.toString(&outputRange.sink, format)
>>  is pretty darn minimal, and if outputRange is an interface or object,  
>> this saves a virtual call per buffer write.  Plus the second form is  
>> more universal, you can pass any delegate, and not have to use a range  
>> type to wrap a delegate.
>>  Don't fall into the "OOP newbie" trap -- where just because you've  
>> found a new concept that is amazing, you want to use it for  
>> everything.  I say this because I've seen in the past where someone  
>> discovers the power of OOP and then wants to use it for everything,  
>> when in some cases, it's overkill.  Just look at some Java "classes"...
>
> There is no need to worry that I'll fall into at least that particular  
> OOP newbie trap.
>
> What I think we should do is define a text output interface that allows  
> writing individual characters of all widths and also arrays of all  
> widths. That would be a universal means for text output.
>
> interface TextOutputStream {
>      void put(dchar); // also accommodates char and wchar
>      void put(in char[]);
>      void put(in wchar[]);
>      void put(in dchar[]);
> }
>
> The toString method (re-baptized as toStream) would take such an  
> interface. Better ideas are always welcome. Perhaps I'm falling another  
> OOP newbie trap! (Seriously!)

This still fits within a single function, which takes one of the 3 widths  
(pick one, they can all be translated to eachother):

void put(in char[] str)
{
   foreach(dchar dc; str)
   {
      put((&dc)[0..1]);
   }
}

Note that you probably want to build a buffer of dchars instead of putting  
one at a time, but you get the idea.

Also, putting a single character is probably pretty uncommon, but can be  
handled in a similar fashion.

That being said, one other point that makes all this moot is -- toString  
is for debugging, not for general purpose.  We don't need to support  
everything that is possible.  You should be able to say "hey, toString  
only accepts char[], deal."  Of course, you could substitute wchar[] or  
dchar[], but I think by far char[] is the most common (and is the default  
type for string literals).

That's not to say there is no reason to have a TextOutputStream object.   
Such a thing is perfectly usable for a toString which takes a char[]  
delegate sink, just pass &put.  In fact, there could be a default toString  
function in Object that does just that:

class Object
{
    ...
    void toString(delegate void(in char[] buf) put, string fmt) const
    {}
    void toString(TextOutputStream tos, string fmt) const
    { toString(&tos.put, fmt); }
}

Of course, then TextOutputStream has to be druntime-accessible, so maybe  
it's not a great idea...  But there are ways around that:

abstract class BaseTextOutputStream : TextOutputStream {
     void format(const Object o, string fmt) { o.toString(&this.put, fmt); }
}

>>  From another thread:
>>> Walter does not feel strongly about Phobos.
>>  Huh?  I feel like this sentence doesn't make sense, so maybe there's a  
>> typo.
>
> I meant to say, Walter does not want to do library design.

I'm trying to remember but I thought he did care about this particular  
issue, but it may be muddled in my memory.  Also note that toString has  
special status from the compiler in regards to structs (that hack with the  
xtoString function in the struct's typeinfo), so it doesn't just affect  
library code.

-Steve