D perfomance

Sun Apr 26 16:20:19 UTC 2020

On 4/25/20 6:34 AM, Joseph Rushton Wakeling wrote:
> On Saturday, 25 April 2020 at 10:15:33 UTC, Walter Bright wrote:
>> On 4/24/2020 12:27 PM, Arine wrote:
>>> There most definitely is a difference and the assembly generated with 
>>> rust is better.
>> D's @live functions can indeed do such optimizations, though I haven't 
>> got around to implementing them in DMD's optimizer. There's nothing 
>> particularly difficult about it.
> 
> In any case, I seriously doubt those kinds of optimization have anything 
> to do with the web framework performance differences.
> 
> My experience of writing number-crunching stuff in D and Rust is that 
> Rust seems to have a small but consistent performance edge that could 
> quite possibly be down the kind of optimizations that Arine mentions 
> (that's speculation: I haven't verified). However, it's small 
> differences, not order-of-magnitude stuff.
> 
> I suppose that in a more complicated app there could be some 
> multiplicative impact, but where high-throughput web frameworks are 
> concerned I'm pretty sure that the memory allocation and reuse strategy 
> is going to be what makes 99% of the difference.
> 
> There may also be a bit of an impact from the choice of futures vs. 
> fibers for managing asynchronous tasks (there's a context switching cost 
> for fibers), but I would expect that to only make a difference at the 
> extreme upper end of performance, once other design factors have been 
> addressed.
> 
> BTW, on the memory allocation front, Mathias Lang has pointed out that 
> there is quite a nasty impact from `assumeSafeAppend`. Imagine that your 
> request processing looks something like this:
> 
>      // extract array instance from reusable pool,
>      // and set its length to zero so that you can
>      // write into it from the start
>      x = buffer_pool.get();
>      x.length = 0;
>      assumeSafeAppend(x);   // a cost each time you do this
> 
>      // now append stuff into x to
>      // create your response
> 
>      // now publish your response
> 
>      // with the response published, clean
>      // up by recycling the buffer back into
>      // the pool
>      buffer_pool.recycle(x);
> 
> This is the kind of pattern that Sociomantic used a lot.  In D1 it was 
> easy because there was no array stomping prevention -- you could just 
> set length == 0 and start appending.  But having to call 
> `assumeSafeAppend` each time does carry a performance cost.

In terms of performance, depending on the task at hand, D1 code is 
slower than D2 appending, by the fact that there's a thread-local cache 
for appending for D2, and D1 only has a global one-array cache for the 
same. However, I'm assuming that since you were focused on D1, your 
usage naturally was written to take advantage of what D1 has to offer.

The assumeSafeAppend call also uses this cache, and so it should be 
quite fast. But setting length to 0 is a ton faster, because you aren't 
calling an opaque function.

So depending on the usage pattern, D2 with assumeSafeAppend can be 
faster, or it could be slower.

> 
> IIRC Mathias has suggested that it should be possible to tag arrays as 
> intended for this kind of re-use, so that stomping prevention will never 
> trigger, and you don't have to `assumeSafeAppend` each time you reduce 
> the length.

I spoke for a while with Dicebot at Dconf 2016 or 17 about this issue. 
IIRC, I suggested either using a custom type or custom runtime. He was 
not interested in either of these ideas, and it makes sense (large 
existing code base, didn't want to stray from mainline D).

By far, the best mechanism to use is a custom type. Not only will that 
fix this problem as you can implement whatever behavior you want, but 
you also do not need to call opaque functions for appending either. It 
should outperform everything you could do in a generic runtime.

Note that this was before (I think) destructor calls were added. The 
destructor calls are something that assumeSafeAppend is going to do, and 
won't be done with just setting length to 0.

However, there are other options. We could introduce a druntime 
configuration option so when this specific situation happens (slice 
points at start of block and has 0 length), assumeSafeAppend is called 
automatically on the first append. Jonathan is right that this is not 
@safe, but it could be an opt-in configuration option.

I don't think configuring specific arrays makes a lot of sense, as this 
would require yet another optional bit that would have to be checked and 
allocated for all arrays.

-Steve