toString refactor in druntime

Thu Oct 30 08:30:18 PDT 2014

On 10/28/14 7:06 PM, Manu via Digitalmars-d wrote:
> On 28 October 2014 22:51, Steven Schveighoffer via Digitalmars-d
> <digitalmars-d at puremagic.com> wrote:
>> On 10/27/14 8:01 PM, Manu via Digitalmars-d wrote:
>>>
>>>    28 October 2014 04:40, Benjamin Thaut via Digitalmars-d
>>> <digitalmars-d at puremagic.com> wrote:
>>>>
>>>> Am 27.10.2014 11:07, schrieb Daniel Murphy:
>>>>
>>>>> "Benjamin Thaut"  wrote in message news:m2kt16$2566$1 at digitalmars.com...
>>>>>>
>>>>>>
>>>>>> I'm planning on doing a pull request for druntime which rewrites every
>>>>>> toString function within druntime to use the new sink signature. That
>>>>>> way druntime would cause a lot less allocations which end up beeing
>>>>>> garbage right away. Are there any objections against doing so? Any
>>>>>> reasons why such a pull request would not get accepted?
>>>>>
>>>>>
>>>>>
>>>>> How ugly is it going to be, since druntime can't use std.format?
>>>>
>>>>
>>>>
>>>> They wouldn't get any uglier than they already are, because the current
>>>> toString functions within druntime also can't use std.format.
>>>>
>>>> An example would be to toString function of TypInfo_StaticArray:
>>>>
>>>> override string toString() const
>>>> {
>>>>           SizeStringBuff tmpBuff = void;
>>>>           return value.toString() ~ "[" ~
>>>> cast(string)len.sizeToTempString(tmpBuff) ~ "]";
>>>> }
>>>>
>>>> Would be replaced by:
>>>>
>>>> override void toString(void delegate(const(char)[]) sink) const
>>>> {
>>>>           SizeStringBuff tmpBuff = void;
>>>>           value.toString(sink);
>>>>           sink("[");
>>>>           sink(cast(string)len.sizeToTempString(tmpBuff));
>>>>           sink("]");
>>>> }
>>>
>>>
>>> The thing that really worries me about this synk API is that your code
>>> here produces (at least) 4 calls to a delegate. That's a lot of
>>> indirect function calling, which can be a severe performance hazard on
>>> some systems.
>>> We're trading out garbage for low-level performance hazards, which may
>>> imply a reduction in portability.
>>
>>
>> I think given the circumstances, we are better off. But when we find a
>> platform that does perform worse, we can try and implement alternatives. I
>> don't want to destroy performance on the platforms we *do* support, for the
>> worry that some future platform isn't as friendly to this method.
>
> Video games consoles are very real, and very now.
> I suspect they may even represent the largest body of native code in
> the world today.

Sorry, I meant future *D supported* platforms, not future 
not-yet-existing platforms.

> I don't know if 'alternatives' is the right phrase, since this
> approach isn't implemented yet, and I wonder if a slightly different
> API strategy exists which may not exhibit this problem.

Well, the API already exists and is supported. The idea is to migrate 
the existing toString calls to the new API.

>> But an aggregate which relies on members to output themselves is going to
>> have a tough time following this model. Only at the lowest levels can we
>> enforce such a rule.
>
> I understand this, which is the main reason I suggest to explore
> something other than a delegate based interface.

Before we start ripping apart our existing APIs, can we show that the 
performance is really going to be so bad? I know virtual calls have a 
bad reputation, but I hate to make these choices absent real data.

For instance, D's underlying i/o system uses FILE *, which is about as 
virtual as you can get. So are you avoiding a virtual call to use a 
buffer to then pass to a virtual call later?

>> Another thing to think about is that the inliner can potentially get rid of
>> the cost of delegate calls.
>
> druntime is a binary lib. The inliner has no effect on this equation.

It depends on the delegate and the item being output, whether the source 
is available to the compiler, and whether or not it's a virtual 
function. True, some cases will not be inlinable. But the "tweaks" we 
implement for platform X which does not do well with delegate calls, 
could be to make this more available.

>>> Ideally, I guess I'd prefer to see an overload which receives a slice
>>> to write to instead and do away with the delegate call. Particularly
>>> in druntime, where API and potential platform portability decisions
>>> should be *super*conservative.
>>
>>
>> This puts the burden on the caller to ensure enough space is allocated. Or
>> you have to reenter the function to finish up the output. Neither of these
>> seem like acceptable drawbacks.
>
> Well that's why I open for discussion. I'm sure there's room for
> creativity here.
>
> It doesn't seem that unreasonable to reenter the function to me
> actually, I'd prefer a second static call in the rare event that a
> buffer wasn't big enough, to many indirect calls in every single case.

A reentrant function has to track the state of what has been output, 
which is horrific in my opinion.

> There's no way that reentry would be slower. It may be more
> inconvenient, but I wonder if some API creativity could address
> that...?

The largest problem I see is, you may not know before you start 
generating strings whether it will fit in the buffer, and therefore, you 
may still end up eventually calling the sink.

Note, you can always allocate a stack buffer, use an inner function as a 
delegate, and get the inliner to remove the indirect calls. Or use an 
alternative private mechanism to build the data.

Would you say that *one* delegate call per object output is OK?

>> What would you propose for such a mechanism? Maybe I'm not thinking of your
>> ideal API.
>
> I haven't thought of one I'm really happy with.
> I can imagine some 'foolproof' solution at the API level which may
> accept some sort of growable string object (which may represent a
> stack allocation by default). This could lead to a virtual call if the
> buffer needs to grow, but that's not really any worse than a delegate
> call, and it's only in the rare case of overflow, rather than many
> calls in all cases.
>

This is a typical mechanism that Tango used -- pass in a ref to a 
dynamic array referencing a stack buffer. If it needed to grow, just 
update the length, and it moves to the heap. In most cases, the stack 
buffer is enough. But the idea is to try and minimize the GC 
allocations, which are performance killers on the current platforms.

I think adding the option of using a delegate is not limiting -- you can 
always, on a platform that needs it, implement a alternative protocol 
that is internal to druntime. We are not preventing such protocols by 
adding the delegate version.

But on our currently supported platforms, the delegate vs. GC call is 
soo much better. I can't see any reason to avoid the latter.

-Steve