D perfomance

Mon Apr 27 05:04:18 UTC 2020

On Sunday, 26 April 2020 at 16:20:19 UTC, Steven Schveighoffer 
wrote:
>
> In terms of performance, depending on the task at hand, D1 code 
> is slower than D2 appending, by the fact that there's a 
> thread-local cache for appending for D2, and D1 only has a 
> global one-array cache for the same. However, I'm assuming that 
> since you were focused on D1, your usage naturally was written 
> to take advantage of what D1 has to offer.
>
> The assumeSafeAppend call also uses this cache, and so it 
> should be quite fast. But setting length to 0 is a ton faster, 
> because you aren't calling an opaque function.
>
> So depending on the usage pattern, D2 with assumeSafeAppend can 
> be faster, or it could be slower.

Well, Sociomantic didn't use any kind of multi-threading in "user 
code".
We had single-threaded fibers for concurrency, and process-level 
scaling for parallelism.
Some corner cases were using threads, but it was for low level 
things (e.g. low latency file IO on Linux), which were highly 
scrutinized and stayed clear of the GC AFAIK.

Note that accessing TLS *does* have a cost which is higher than 
accessing a global. By this reasoning, I would assume that D2 
appending would definitely be slower, although I never profiled 
it. What I did profile tho, is `assumeSafeAppend`. The fact that 
it looks up GC metadata (taking the GC lock in the process) made 
it quite expensive given how often it was called (in D1 it was 
simply a no-op, and called defensively).

>> IIRC Mathias has suggested that it should be possible to tag 
>> arrays as intended for this kind of re-use, so that stomping 
>> prevention will never trigger, and you don't have to 
>> `assumeSafeAppend` each time you reduce the length.
>
> I spoke for a while with Dicebot at Dconf 2016 or 17 about this 
> issue. IIRC, I suggested either using a custom type or custom 
> runtime. He was not interested in either of these ideas, and it 
> makes sense (large existing code base, didn't want to stray 
> from mainline D).
>
> By far, the best mechanism to use is a custom type. Not only 
> will that fix this problem as you can implement whatever 
> behavior you want, but you also do not need to call opaque 
> functions for appending either. It should outperform everything 
> you could do in a generic runtime.

Well... Here's nothing I never really quite understood actually: 
Mihails *did* introduce a buffer type. See 
https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/core/Buffer.d#L116-L130
And we also had a (very old) similar utility here: 
https://github.com/sociomantic-tsunami/ocean/blob/36c9fda09544ee5a0695a74186b06b32feda82d4/src/ocean/util/container/ConcatBuffer.d
I always wanted to unify this, but never got to it. But if you 
look at the first link, it calls `assumeSafeAppend` twice, before 
and after setting the length. In practice it is only necessary 
*after* reducing the length, but as I mentioned, this is 
defensive programming.

For reference, most of our applications had a principled buffer 
use. The buffers would rarely be appended to from more than one, 
perhaps two places. However, slices to the buffer would be passed 
around quite liberally. So a buffer type from which one could 
borrow would indeed have been optimal.

> Note that this was before (I think) destructor calls were 
> added. The destructor calls are something that assumeSafeAppend 
> is going to do, and won't be done with just setting length to 0.
>
> However, there are other options. We could introduce a druntime 
> configuration option so when this specific situation happens 
> (slice points at start of block and has 0 length), 
> assumeSafeAppend is called automatically on the first append. 
> Jonathan is right that this is not @safe, but it could be an 
> opt-in configuration option.
>
> I don't think configuring specific arrays makes a lot of sense, 
> as this would require yet another optional bit that would have 
> to be checked and allocated for all arrays.
>
> -Steve

I don't even know if we had a single case where we had arrays of 
objects with destructors. The vast majority of our buffer were 
`char[]` and `ubyte[]`. We had some elaborate types, but I think 
destructors + buffer would have been frowned upon in code review.

Also the reason we didn't modify druntime to just have the D1 
behavior (that would have been a trivial change) was because how 
dependent on the new behavior druntime had become. It was also 
the motivation for the suggestion Joe mentioned. AFAIR I 
mentioned it in an internal issue, did a PoC implementation, but 
never got it to a state were it was mergeable.

Also, while a custom type might sound better, it doesn't really 
interact well with the rest of the runtime, and it's an extra 
word to pass around (if passed by value).