Scientific computing and parallel computing C++23/C++26

Fri Jan 14 01:37:29 UTC 2022

On Thursday, 13 January 2022 at 23:28:01 UTC, Guillaume Piolat 
wrote:
> On Thursday, 13 January 2022 at 20:38:19 UTC, Bruce Carneal 
> wrote:
>>
>> Ethan might have a sufficiently compelling economic case for 
>> promoting dcompute to his company in the relatively near 
>> future. Nicholas recently addressed their need for access to 
>> the texture hardware and fitting within their work flow, but 
>> there may be other requirements...  An adoption by a world 
>> class game studio would, of course, be very good news but I 
>> think Ethan is slammed (perpetually, and in a mostly good way, 
>> I think) so it might be a while.
>
> As a former GPGPU guy: can you explain in what ways dcompute 
> improves life over using CUDA and OpenCL through 
> DerelictCL/DerelictCUDA (I used to maintain them and I think 
> nobody ever used them). Using the API directly seems to offer 
> the most control to me, and no special compiler support.

For me there were several things, including:

   1) the dcompute kernel invocation was simpler, made more sense, 
letting me easily create invocation abstractions to my liking 
(partially memoized futures in my case but easy enough to do 
other stuff)

   2) the kernel meta programming was much friendlier generally, 
of course.

   3) the D nested function capability, in conjunction with better 
meta programming, enabled great decomposition, intra kernel.  You 
could get the compiler to keep everything within the 
maximum-dispatch register limit (64) with ease, with readable 
code.

   4) using the above I found it easy to reduce/minimize memory 
traffic, an important consideration in that much of my current 
work is memory bound.  Trivial example: use static foreach to 
logically unroll a window neighborhood algorithm eliminating both 
unnecessary loads and all extraneous reg-to-reg moves as you 
naturally mod around.

It's not that you that you can't do such things in CUDA/C++, 
eventually, sometimes, after quite a bit of discomfort, once you 
acquire your level-bazillion C++ meta programming merit badge, 
it's that it's all so much *easier* to do in dcompute.  You get 
to save the heroics for something else.

I'm sure that new idioms/benefits will emerge with additional use 
(this was my first dcompute project) but, as you will have 
noticed :-), I'm already hooked.

WRT OpenCL I don't have much to say.  From what I gather people 
consider OpenCL to be even less hospitable than CUDA, preferring 
OpenCL mostly (only?) for its non-proprietary status.  I'd be 
interested to hear from OpenCL gurus on this topic.

Finally, if any of the above doesn't make sense, or you'd like to 
discuss it further, I suggest we meet up at beerconf.  I'd also 
love to talk about data parallel latency sensitive coding 
strategies, about how we should deal with HW capability 
variation, about how we can introduce data parallelism to many 
more in the dlang community, ...