GPGPU progess

Thu May 18 03:26:13 PDT 2017

On Thursday, 18 May 2017 at 09:07:38 UTC, Nicholas Wilson wrote:
> When ldc runs you will get a kernels_cudaxxx_yy.ptx (where xxx 
> is the CUDA compute capability specified on the command line 
> and yy is 32 or 64 for 32 or 64bit) which should fit somewhere 
> into your existing C++ pipeline.

Whoops, that assumes you have a CUDA driver API pipeline in your 
C++ code, which if you're asking I'm not sure that you have.
If you're using the `kernel<<<...>>>(args)` form to launch you 
kernels then you are going to have a lot more work to do in D 
because you'll need to use the driver API 
(http://docs.nvidia.com/cuda/cuda-driver-api/#axzz4hQLA0Zdm)
You'll need to:
*get a device
*create a context from it
*get a stream on that context
*load the ptx module (possibly linking it with other modules, to 
resolve missing symbols).
*compile it for the device
*then launch a kernel from that module on that device, by name 
passing the arguments in a void*[].

The sad thing is that its still nice than OpenCL because in 
OpenCL you have to pass the runtime args (with sizes) one by one 
to a function.

Hence why I want to automate as much of that shit as is possible.
  I hope to have that done ASAP, but I don't have hardware set up 
to test CUDA at the moment (I have one on my windows box but I 
don't have dev set up there) and I'll be working on OpenCL at the 
same time (and theres only so much horrible API I can take in a 
day).
  I'll be working on dcompute part-part-time next semester though 
so I should be able to get a fair bit done and quite a few others 
are interested so that'll speed thing up a bit.