Semantics of toString
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Nov 12 08:46:48 PST 2009
Steven Schveighoffer wrote:
> On Thu, 12 Nov 2009 10:29:17 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> Steven Schveighoffer wrote:
>>> On Tue, 10 Nov 2009 18:49:54 -0500, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> I think the best option for toString is to take an output range and
>>>> write to it. (The sink is a simplified range.)
>>> Bad idea...
>>> A range only makes sense as a struct, not an interface/object. I'll
>>> tell you why: performance.
>>
>> You are right. If range interfaces accommodate block transfers, this
>> problem may be addressed. I agree that one virtual call per character
>> output would be overkill. (I seem to recall it's one of the reasons
>> why C++'s iostreams are so inefficient.)
>
> IIRC, I don't think C++ iostreams use polymorphism
Oh yes they do. (Did you even google?) Virtual multiple inheritance, the
works.
http://www.deitel.com/articles/cplusplus_tutorials/20060225/virtualBaseClass/
>, and I don't think
> they use the "one char at a time" method.
Well they do offer one char at a time and also a block transfer.
http://msdn.microsoft.com/en-us/library/760t8w1z%28VS.80%29.aspx
I'm not sure how the heck but they still manage to call one virtual
method per char, otherwise they'd be plenty fast, which they aren't. I
seem to recall write() has a default implementation that calls put() in
a loop or something. It's not a topic that I want to study closely.
iostreams suck, why spend time on learning the quirks of a broken design.
>>> Ranges are special in two respects:
>>> 1. They are foreachable. I think everyone agrees that calling 2
>>> interface functions per loop iteration is much lower performing than
>>> using opApply, which calls one delegate function per loop. My
>>> recommendation -- use opApply when dealing with polymorphism. I
>>> don't think there's a way around this.
>> >
>>> 2. They are useful for passing to std.algorithm. But std.algorithm
>>> is template-interfaced. No need for using interfaces because the
>>> correct instatiation will be chosen.
>>> If you are intending to add a streaming module that uses ranges,
>>> would it not be templated for the range type as std.algorithm is? If
>>> not, the next logical choice is a delegate, which requires no vtable
>>> lookup. Using an interface is just asking for a performance penalty
>>> for not much gain.
>>
>> I think the cost of calling through the delegate is roughly the same
>> as a virtual call.
>
> Not exactly. I think you are right that struct member calls are faster
> than delegates, but only slightly. The difference being that a struct
> member call does not need to load the function address from the stack,
> it can hard-code the address directly.
>
> However, virtual calls have to be lower performing because you are doing
> two indirections, one to the class vtable, then one to the function
> address itself. Plus those two locations are most likely located on the
> heap, not the stack, and so may not be in the cache.
I think the only way to figure is to measure. For one thing I disagree
with the comment about the cache - a vtable is quite likely to be warm
after a couple of calls.
I know one thing - Walter's old format function used delegates and it
was unusably slow.
>>> x.toString(outputRange, format)
>>> and
>>> x.toString(&outputRange.sink, format)
>>> is pretty darn minimal, and if outputRange is an interface or
>>> object, this saves a virtual call per buffer write. Plus the second
>>> form is more universal, you can pass any delegate, and not have to
>>> use a range type to wrap a delegate.
>>> Don't fall into the "OOP newbie" trap -- where just because you've
>>> found a new concept that is amazing, you want to use it for
>>> everything. I say this because I've seen in the past where someone
>>> discovers the power of OOP and then wants to use it for everything,
>>> when in some cases, it's overkill. Just look at some Java "classes"...
>>
>> There is no need to worry that I'll fall into at least that particular
>> OOP newbie trap.
>>
>> What I think we should do is define a text output interface that
>> allows writing individual characters of all widths and also arrays of
>> all widths. That would be a universal means for text output.
>>
>> interface TextOutputStream {
>> void put(dchar); // also accommodates char and wchar
>> void put(in char[]);
>> void put(in wchar[]);
>> void put(in dchar[]);
>> }
>>
>> The toString method (re-baptized as toStream) would take such an
>> interface. Better ideas are always welcome. Perhaps I'm falling
>> another OOP newbie trap! (Seriously!)
>
> This still fits within a single function, which takes one of the 3
> widths (pick one, they can all be translated to eachother):
>
> void put(in char[] str)
> {
> foreach(dchar dc; str)
> {
> put((&dc)[0..1]);
> }
> }
>
> Note that you probably want to build a buffer of dchars instead of
> putting one at a time, but you get the idea.
I don't get the idea. I'm seeing one virtual call per character.
> Also, putting a single character is probably pretty uncommon, but can be
> handled in a similar fashion.
I'm not sure about the uncommonality of outputting one character, but it
may be good to discourage it just to not foster slow code.
> That being said, one other point that makes all this moot is -- toString
> is for debugging, not for general purpose. We don't need to support
> everything that is possible. You should be able to say "hey, toString
> only accepts char[], deal." Of course, you could substitute wchar[] or
> dchar[], but I think by far char[] is the most common (and is the
> default type for string literals).
I was hoping we could elevate the usefulness of toString a bit.
> That's not to say there is no reason to have a TextOutputStream object.
> Such a thing is perfectly usable for a toString which takes a char[]
> delegate sink, just pass &put. In fact, there could be a default
> toString function in Object that does just that:
>
> class Object
> {
> ...
> void toString(delegate void(in char[] buf) put, string fmt) const
> {}
> void toString(TextOutputStream tos, string fmt) const
> { toString(&tos.put, fmt); }
> }
I'd agree with the delegate idea if we established that UTF-8 is favored
compared to all other formats.
Andrei
More information about the Digitalmars-d
mailing list