std.parallelism: Request for Review

dsimcha dsimcha at yahoo.com
Sun Feb 27 06:48:05 PST 2011


On 2/27/2011 8:03 AM, Russel Winder wrote:
> 32-bit mode on a 8-core (twin Xeon) Linux box.  That core.cpuid bug
> really, really sucks.
>
> I see matrix inversion takes longer with 4 cores than with 1!

Can you please re-run the benchmark to make sure that this isn't just a 
one-time anomaly?  I can't seem to make the parallel matrix inversion 
run slower than serial on my hardware, even with ridiculous tuning 
parameters that I was almost sure would bottleneck the thing on the task 
queue.  Also, all the other benchmarks actually look pretty good.

It's possible that machines with multiple physical CPUs are much more 
likely to bottleneck on the task queue because synchronized blocks cost 
a few more clock cycles.  It's also possible that stack alignment issues 
are creeping in somewhere I hadn't anticipated, or that using 4 cores 
instead of two on a fairly fine-grained benchmark is enough to 
bottleneck on the queue (though I doubt this because this benchmark 
worked well for others with quad cores).




More information about the Digitalmars-d mailing list