Scientific computing and parallel computing C++23/C++26
Bruce Carneal
bcarneal at gmail.com
Fri Jan 14 01:37:29 UTC 2022
On Thursday, 13 January 2022 at 23:28:01 UTC, Guillaume Piolat
wrote:
> On Thursday, 13 January 2022 at 20:38:19 UTC, Bruce Carneal
> wrote:
>>
>> Ethan might have a sufficiently compelling economic case for
>> promoting dcompute to his company in the relatively near
>> future. Nicholas recently addressed their need for access to
>> the texture hardware and fitting within their work flow, but
>> there may be other requirements... An adoption by a world
>> class game studio would, of course, be very good news but I
>> think Ethan is slammed (perpetually, and in a mostly good way,
>> I think) so it might be a while.
>
> As a former GPGPU guy: can you explain in what ways dcompute
> improves life over using CUDA and OpenCL through
> DerelictCL/DerelictCUDA (I used to maintain them and I think
> nobody ever used them). Using the API directly seems to offer
> the most control to me, and no special compiler support.
For me there were several things, including:
1) the dcompute kernel invocation was simpler, made more sense,
letting me easily create invocation abstractions to my liking
(partially memoized futures in my case but easy enough to do
other stuff)
2) the kernel meta programming was much friendlier generally,
of course.
3) the D nested function capability, in conjunction with better
meta programming, enabled great decomposition, intra kernel. You
could get the compiler to keep everything within the
maximum-dispatch register limit (64) with ease, with readable
code.
4) using the above I found it easy to reduce/minimize memory
traffic, an important consideration in that much of my current
work is memory bound. Trivial example: use static foreach to
logically unroll a window neighborhood algorithm eliminating both
unnecessary loads and all extraneous reg-to-reg moves as you
naturally mod around.
It's not that you that you can't do such things in CUDA/C++,
eventually, sometimes, after quite a bit of discomfort, once you
acquire your level-bazillion C++ meta programming merit badge,
it's that it's all so much *easier* to do in dcompute. You get
to save the heroics for something else.
I'm sure that new idioms/benefits will emerge with additional use
(this was my first dcompute project) but, as you will have
noticed :-), I'm already hooked.
WRT OpenCL I don't have much to say. From what I gather people
consider OpenCL to be even less hospitable than CUDA, preferring
OpenCL mostly (only?) for its non-proprietary status. I'd be
interested to hear from OpenCL gurus on this topic.
Finally, if any of the above doesn't make sense, or you'd like to
discuss it further, I suggest we meet up at beerconf. I'd also
love to talk about data parallel latency sensitive coding
strategies, about how we should deal with HW capability
variation, about how we can introduce data parallelism to many
more in the dlang community, ...
More information about the Digitalmars-d
mailing list