ldc nvvm GPU intrinsics good news
Bruce Carneal
bcarneal at gmail.com
Fri Mar 5 00:03:26 UTC 2021
After updating the first line to
'@compute(CompileFor.hostAndDevice) module ...' and adding an
'import ldc.dcompute;' line, the
runtime/import/ldc/gccbuiltins_nvvm.di file from a current LDC
build apparently gives access to all manner of GPU intrinsics.
I've only tried it out on __syncthreads and __nvvm_shfl_down_i32
but both "invocations" of those LDC_intrinsics resulted in the
expected single ptx instruction in the dcompute .ptx output.
There are over 600 pragma(LDC_intrinsic, "llvm.nvvm.xxxxxx")
builtins in the gcc_builtins_nvvm.di file so, of course, I've not
hand tested them all but it looks very promising.
If you're working with dcompute on an OpenCL device, I'd love to
hear if something similar works for your use cases of if you've
found another way forward.
More information about the digitalmars-d-ldc
mailing list