std.parallelism: Request for Review
dsimcha
dsimcha at yahoo.com
Sun Feb 27 06:48:05 PST 2011
On 2/27/2011 8:03 AM, Russel Winder wrote:
> 32-bit mode on a 8-core (twin Xeon) Linux box. That core.cpuid bug
> really, really sucks.
>
> I see matrix inversion takes longer with 4 cores than with 1!
Can you please re-run the benchmark to make sure that this isn't just a
one-time anomaly? I can't seem to make the parallel matrix inversion
run slower than serial on my hardware, even with ridiculous tuning
parameters that I was almost sure would bottleneck the thing on the task
queue. Also, all the other benchmarks actually look pretty good.
It's possible that machines with multiple physical CPUs are much more
likely to bottleneck on the task queue because synchronized blocks cost
a few more clock cycles. It's also possible that stack alignment issues
are creeping in somewhere I hadn't anticipated, or that using 4 cores
instead of two on a fairly fine-grained benchmark is enough to
bottleneck on the queue (though I doubt this because this benchmark
worked well for others with quad cores).
More information about the Digitalmars-d
mailing list