std.parallel_algorithm
dsimcha
dsimcha at yahoo.com
Mon May 23 04:40:26 PDT 2011
On 5/23/2011 6:10 AM, Russel Winder wrote:
> Jonathan,
>
> On Mon, 2011-05-23 at 02:15 -0700, Jonathan M Davis wrote:
>> On 2011-05-23 02:07, Russel Winder wrote:
>>> On Mon, 2011-05-23 at 08:16 +0200, Andrej Mitrovic wrote:
>>>> Russell, get parallelism.d from here:
>>>> https://github.com/dsimcha/phobos/tree/master/std
>>>
>>> Too many ls in there but I'll assume it's me ;-)
>>>
>>> So the implication is that std.parallelism has changed since the 2.053
>>> release and that std.parallel_algorithm relies on this change?
>>
>> That's what David said in his initial post. He had to make additional changes
>> to std.parallelism for std.parallel_algorithm to work.
>
> Ah, thanks for the reminder. That'll teach me to skim read, get the
> URL, act, and fail to actually note important information.
>
Interesting. Thanks. Unfortunately it doesn't look like merge and
dotProduct scale to a quad core very well, because I tuned these to be
very close to the break-even point on a dual core. This is part of my
concern: In the worst case, stuff that's very close to break-even on a
dual or quad core can get **slower** when run on a 6 or 8 core. This is
just a fact of life. Eventually, I'd like to push the break-even point
for std.parallelism a little lower and I know roughly how to do it
(mainly by implementing work stealing instead of a simple FIFO queue).
However, there are tons of nagging little implementation issues and
there would still always be a break even point, even if it's lower.
Right now the break-even point seems to be on the order of a few
microseconds per task, which on modern hardware equates to ~10,000 clock
cycles. This is good enough for a lot of practical purposes, but
obviously not nano-parallelism. I've made improving this a fairly low
priority. Given the implementation difficulties, the fact that it's
fairly easy to take full advantage of multicore hardware without getting
down to this granularity (usually if you need more performance you're
executing some expensive algorithm in a loop and you can just
parallelize the outer loop and be done with it) and the fact that
there's plenty of other useful hacking jobs in the D ecosystem, the
benefits-to-effort ratio doesn't seem that good.
More information about the Digitalmars-d
mailing list