multithread/concurrency/parallel methods and performance

Mon Feb 19 14:50:17 UTC 2018

On Monday, 19 February 2018 at 05:49:54 UTC, Nicholas Wilson 
wrote:
> As SIZE=1024*1024 (i.e. not much, possibly well within L2 cache 
> for 32bit) it may be that dealing with the concurrency overhead 
> adds a significant amount of overhead.

That 'concurrency overhead' is what i´m not getting.
Since the array is big, dividing it into 6, 7 cores will not 
trash L1 since they are very far from each other, right? Or L2 
cache trashing is also a problem in this case?

> _base : 150 ms, 728 μs, and 5 hnsecs
> _parallel : 120 ms, 78 μs, and 5 hnsecs
> _concurrency : 134 ms, 787 μs, and 4 hnsecs
> _thread : 129 ms, 476 μs, and 2 hnsecs
>

Yes, on my PC I was using -release.

Yet, 150ms for 1 core. 120-134ms of X cores.
Shouldn´t be way faster? I´m trying to understand where the 
overhead is, and if is possible to get rid of it (perfect thread 
scaling).