First time using Parallel

Sun Dec 26 11:10:44 UTC 2021

On Sunday, 26 December 2021 at 06:10:03 UTC, Era Scarecrow wrote:
>  This is curious. I was up for trying to parallelize my code, 
> specifically having a block of code calculate some polynomials 
> (*Related to Reed Solomon stuff*). So I cracked open 
> std.parallel and looked over how I would manage this all.
>
>  To my surprise I found ParallelForEach, which gives the 
> example of:
>
> ```d
> foreach(value; taskPool.parallel(range) ){code}
> ```
>
> Since my code doesn't require any memory management, shared 
> resources or race conditions (*other than stdout*), I plugged 
> in an iota and gave it a go. To my amazement no compiling 
> issues, and all my cores are in heavy use and it's outputting 
> results!
>
>  Now said results are out of order (*and early results are 
> garbage from stdout*), but I'd included a bitwidth comment so 
> sorting should be easy.
> ```d
>         0x3,    /*7*/
>         0x11,   /*9*/
>         0x9,    /*10*/
>         0x1D,   /*8*/
>         0x5,    /*11*/
>         0x3,    /*15*/
>         0x53,   /*12*/
>         0x1B,   /*13*/
>         0x2B,   /*14*/
> ```
> etc etc.
>
>  Previously years ago I remember having to make a struct and 
> then having to pass a function and a bunch of stuff from within 
> the struct, often breaking and being hard to get to even work 
> so I didn't hardly touch this stuff. This is making outputting 
> data MUCH faster and so easily; Well at least on a beefy 
> computer and not just some chromebook I'm programming on so it 
> can all be on the go.
>
>
>  So I suppose, is there anything I need to know? About shared 
> resources or how to wait until all threads are done?

Parallel programming is one of the deepest rabbit holes you can 
actually get to use in practice. Your question at the moment 
doesn't really have much context to it so it's difficult to 
suggest where you should go directly.

I would start by removing the use of stdout in your loop kernel - 
I'm not familiar with what you are calculating, but if you can 
basically have the (parallel) loop operate from (say) one array 
directly into another then you can get extremely good parallel 
scaling with almost no effort.

Not using in the actual loop should make the code faster even 
without threads because having a function call in the hot code 
will mean compilers optimizer will give up on certain 
transformations - i.e. do all the work as compactly as possible 
then output the data in one step at the end.