multithread/concurrency/parallel methods and performance
SrMordred
patric.dexheimer at gmail.com
Mon Feb 19 14:50:17 UTC 2018
On Monday, 19 February 2018 at 05:49:54 UTC, Nicholas Wilson
wrote:
> As SIZE=1024*1024 (i.e. not much, possibly well within L2 cache
> for 32bit) it may be that dealing with the concurrency overhead
> adds a significant amount of overhead.
That 'concurrency overhead' is what i´m not getting.
Since the array is big, dividing it into 6, 7 cores will not
trash L1 since they are very far from each other, right? Or L2
cache trashing is also a problem in this case?
> _base : 150 ms, 728 μs, and 5 hnsecs
> _parallel : 120 ms, 78 μs, and 5 hnsecs
> _concurrency : 134 ms, 787 μs, and 4 hnsecs
> _thread : 129 ms, 476 μs, and 2 hnsecs
>
Yes, on my PC I was using -release.
Yet, 150ms for 1 core. 120-134ms of X cores.
Shouldn´t be way faster? I´m trying to understand where the
overhead is, and if is possible to get rid of it (perfect thread
scaling).
More information about the Digitalmars-d-learn
mailing list