ldc/dcompute and shared (programmer managed cache) access
Bruce Carneal
bcarneal at gmail.com
Fri Mar 5 04:49:56 UTC 2021
On Friday, 5 March 2021 at 00:39:29 UTC, kinke wrote:
> On Thursday, 4 March 2021 at 04:52:52 UTC, Bruce Carneal wrote:
>> As a slightly cleaner alternative I'll next look at using
>> ldc/dcompute as a .o and .ptx generator while punting the
>> fatbin/linking stuff to clang or nvcc in a build script. The
>> current simplicity of single-ptx-file is very nice but
>> foregoing shared memory performance boosts is not nice so ...
>> I'm pretty sure that we'll need to move beyond the single ptx
>> file model if we want to embrace shared cleanly.
>
> Not sure if it's of any help, but using something like
> `-mtriple=nvptx64 -mcpu=sm_50` might suffice:
> https://github.com/ldc-developers/ldc/pull/3411#issuecomment-619385886
Per the other thread that you've seen, and helped with, the
intrinsics are mostly good to go. Thanks.
Unfortunately CUDA __shared__ is not, AFAICT, addressed by the
pull request that you cited. We need some way to get a
__shared__ declaration emitted to the PTX file.
The code generation through standalone (uninitialized) __shared__
pointers, which we *can* declare, already emits the correct PTX
instructions (ld.shared, st.shared) but we have no way, at least
that I've found, to declare __shared__ address space storage that
we can point to. (clang cuda can do this, nvvc can obviously do
this, and we can do this currently only if we manually modify the
.PTX file)
The newly available shuffle intrinsics provide for efficient
cooperative processing at the warp (32 lane) level, which helps a
lot. The __shared__ address space capability would allow for
more efficient cooperation for up to 1024 lanes (an Nvidia
"block" upper limit currently IIUC).
As I understand it the L1 caches on Nvidia cards are now
writeback so, with the very relaxed memory model in play on GPUs
an explicit programmer scratchpad (__shared__) is not as
important as it used to be for many apps but in some situations
it's still important. Would be great if we could find a way to
provide the capability some where down the road.
More information about the digitalmars-d-ldc
mailing list