Advice on threading/fibers/?

Thu Jun 16 07:45:44 PDT 2011

On 06/16/2011 06:22 AM, Lars T. Kyllingstad wrote:
> On Wed, 15 Jun 2011 23:57:25 +0000, Justin Whear wrote:
>
>> Consider the following:
>>
>> You have 10 million data points and you need to apply a multipass
>> algorithm to them. Each pass is like a cellular automata: it can read
>> from the previous pass but it doesn't know the "current" values. This
>> makes the actual processing of each value trivially parallelizable. The
>> actual operation for each value is fairly simple and cheap (essentially
>> a multidimensional ancestor-child averaging operation).
>>
>> After each value has been operated on once, the pass is complete and the
>> "current" and "old" buffers are switched (conceptually, the "current"
>> buffer can only be written to, the "old" buffer can only be read--using
>> __gshared here).
>>
>> The number of passes is not fixed; in the course of each value
>> operation, an error is computed. When the worst individual error falls
>> below a certain threshold, the algorithm is finished. Generally this
>> will take between one thousand and ten thousand passes.
>>
>> How would you go about parallelizing this? My thought is to take the
>> map/reduce approach within each pass: each thread/fiber takes a slice of
>> the dataset, makes its modifications, then returns an error summary.
>> These summaries are quickly combined and the algorithm loop decides
>> whether to run again. Each pass shouldn't take more than a second or
>> two, so I'm not sure whether introducing the overhead of spawning, say,
>> 10 threads each pass is worthwhile (times 5000 passes). On the other
>> hand, I have plenty of CPUs to throw at it (at least 16 cores, each with
>> hyperthreading) and am in a situation where "as fast as possible" is
>> important (while individual datasets may not grow, the number of them
>> is).
>>
>> Any thoughts appreciated.
>
> I would recommend you take a look at the new std.parallelism module,
> which was introduced in the most recent DMD release (2.053):
>
> http://www.d-programming-language.org/phobos-prerelease/
> std_parallelism.html
>
> -Lars
I wrote an article on using std.parallelism for a bucket-sort algorithm. 
  http://www.codestrokes.com/archives/116

I hope it helps.