ldc/dcompute atomics for nvptx?
Bruce Carneal
bcarneal at gmail.com
Tue Apr 6 02:23:59 UTC 2021
I'd like to use atomic (rmw) operations from within ldc while
targeting nvptx (via dcompute).
The first place to check is dcompute.std.atomic. That's a nice
placeholder, but only a placeholder, so I started poking around
in ldc and clang. After a modest amount of poking I'm still not
sure how to proceed.
If you know of a simple way to bring atomics online for
dcompute/nvptx, I'd like to hear from you. Alternatively, if you
know why nvptx atomics will be hard to bring online, I'd also
like to hear from you.
On a positive note, I've had some success in using dcompute/D's
meta programming facilities reworking areal/stencil compute
kernels to operate out of "arrays of registers". You meta-unroll
til you wrap around the stencil, avoiding moves, and you can use
intra-warp shuffles to/from lateral neighbors to minimize load on
the memory subsystem when rolling on to the next row.
Another D advantage over CUDA/C++ that can be exploited is nested
functions. You can declare variables at the outer function level
where they'll pretty much all be mapped to registers (you've got
at least 64 per SIMT "lane" to work with, and it's easy to check
for spills). You can then access those enregistered variables
directly from within the nested functions. Sometimes it's nice
not having to pass everything through an argument list.
Thanks again to the ldc/dcompute team for providing the tooling
that makes the above possible. And thanks in advance for any
guidance on getting atomics up for nvptx.
More information about the digitalmars-d-ldc
mailing list