GPGPU progess

Tue May 30 02:54:55 PDT 2017

On Tuesday, 30 May 2017 at 08:14:16 UTC, Manu wrote:
> On 30 May 2017 at 17:33, Nicholas Wilson via Digitalmars-d < 
> digitalmars-d at puremagic.com> wrote:
>
>> On Thursday, 18 May 2017 at 05:39:52 UTC, Manu wrote:
>>
>>> How far are we from integration into LDC without using forked 
>>> compilers?
>>>
>>
>> The future is now!
>>
>> https://forum.dlang.org/thread/zcfqujlgnultnqfksbjh@forum.dlang.org
>>
>> https://github.com/ldc-developers/ldc/commit/69ad69e872f53c1
>> 4c101e2c029c4757c4073f487
>> is the final commit from the stuff I've done prior to dconf.
>>
>
> Awesome stuff! That was fast :)
>
> You're right, I'm using kernel<<<...>>>, and it's very 
> convenient.

Yep, thats (one of the reasons) why CUDA is more successful than 
OpenCL and therefore one of the more powerful draws for those 
poor sods using OpenCL.

> I looked briefly and realised that I had a lot of work to get 
> running (as
> you describe), so I stuck with my current setup for the moment 
> :(
>

I figured, I'll get you using it eventually.

> Is a <<<...>>> equivalent going to be possible in D, with 
> kernel object fragments built into the binary together with the 
> CPU code?

As I explained in my dconf presentation: the idea is to have
Queue q = ... ; // the equivalent of a CUDA stream

     q.enqueue!kernel(sizes)(kernel_arguments);

where q.enqueue returns a callable that you then call with the 
arguments. It was modelled directly after CUDAs <<<...>>>

as for embedding in the binary a post build step that does

ubyte[] ptx_code = import("kernels_cuda620_64.ptx");

should be doable as should invoking ptxas and doing the same.
Then proving a consistent naming convention is used the code can 
do its magic.
Or the files could just be read from disk.

>
> I'm definitely looking forward to action in this space, and the 
> wiki to come online :)

Yeah once my thesis is done thing should start moving. Any input 
with your expertise with CUDA will be much appriciated.