D Language Foundation October 2023 Quarterly Meeting Summary
Bastiaan Veelo
Bastiaan at Veelo.net
Sun Dec 10 15:08:05 UTC 2023
On Wednesday, 6 December 2023 at 16:28:08 UTC, Mike Parker wrote:
> Bastiaan reported that SARC had been testing their D codebase
> (transpiled from Pascal---[see Bastiaan's DConf 2019
> talk](https://youtu.be/HvunD0ZJqiA)). They'd found the
> multithreaded performance worse than the Pascal version. He
> said that execution time increased with more threads and that
> it didn't matter how many threads you throw at it. It's the
> latter problem he was focused on at the moment.
I have an update on this issue. But first let me clarify how
grave this situation is (was!) for us. There are certain tasks
that we, and our customers, need to perform that involves a 20
logical core computer to crunch numbers for a week. This is
painful, but it also means that a doubling of that time is
completely unacceptable, let alone a 20-fold increase. It is the
difference between in business and out of business.
Aside from the allocation issue, there are several other
properties that our array implementation needs to replicate from
Extended Pascal: being able to have non-0 starting indices,
having value semantics, having array limits that can be
compile-time and run-time, and function arguments that must work
on arrays of any limits, also for multi-dimensional arrays. So
while trying to solve one aspect, care had to be taken not to
break any of the other aspects.
It turned out that thread contention had more than one causes,
which made this an extra frustrating problem because just as we
thought to have found the culprit, it did not have the effect
that we expected.
These were the three major reasons we were seeing large thread
contention, in no particular order:
1) Missing `scope` storage class specifiers on `delegate`
function arguments. This can be chalked down as a beginner error,
but also one that is easy to miss. If you didn't know: without
`scope` the compiler cannot be sure that the delegate is not
stored in some variable that has a longer lifetime than the stack
frame of the (nested) function pointed to by the delegate.
Therefore, a dynamic closure is created, which means that the
stack is copied to new GC-allocated memory. In the majority of
our cases, delegate arguments are simple callbacks that are only
stored on the stack, but a select number of delegates in the GUI
are stored for longer. The compiler can check if `scope`
delegates escape a function, but it only does this in `@safe`
code --- and our code is long from being `@safe`. So it was a bit
of a puzzle to find out which arguments needed to be `scope` and
which arguments couldn't be `scope`.
2) Allocating heap memory in the array implementation, as
discussed in the meeting. We followed Walter's advice and now use
`alloca`. Not directly, but using string mixin's and static
member functions that generate the appropriate code.
3) Stale calls to `GC.addRange` and `GC.removeRange`. These were
left over from an experiment where we tried to circumvent the
garbage collector. Without knowing these were still in there, we
were puzzled because we even saw contention in code that was
marked `@nogc`. It makes sense now, because even though
`addRange` doesn't allocate, it does need the global GC lock to
register the range safely. Because the stack is already scanned
by default, these calls were now superfluous and could be removed.
So now all cores are finally under full load, which is a
magnificent sight! Speed of DMD `release-nobounds` is on par with
our Pascal version, if not slightly faster. We are looking
forward to being able to safely use LDC, because tests show that
it has the potential to at least double the performance.
A big sigh of relief from us as we have solved the biggest hurdle
(hopefully!) on our way to full adoption of D.
-- Bastiaan.
More information about the Digitalmars-d-announce
mailing list