D and GPGPU

Wed Feb 18 23:57:53 PST 2015

On Wednesday, 18 February 2015 at 18:14:19 UTC, luminousone wrote:
> On Wednesday, 18 February 2015 at 15:15:21 UTC, Russel Winder 
> wrote:
>> It strikes me that D really ought to be able to work with 
>> GPGPU – is
>> there already something and I just failed to notice. This is 
>> data
>> parallelism but of a slightly different sort to that in 
>> std.parallelism.
>> std.concurrent, std.parallelism, std.gpgpu ought to be 
>> harmonious
>> though.
>>
>> The issue is to create a GPGPU kernel (usually C code with 
>> bizarre data
>> structures and calling conventions) set it running and then 
>> pipe data in
>> and collect data out – currently very slow but the next 
>> generation of
>> Intel chips will fix this (*). And then there is the 
>> OpenCL/CUDA debate.
>>
>> Personally I think OpenCL, for all it's deficiencies, as it is 
>> vendor
>> neutral. CUDA binds you to NVIDIA. Anyway there is an NVIDIA 
>> back end
>> for OpenCL. With a system like PyOpenCL, the infrastructure 
>> data and
>> process handling is abstracted, but you still have to write 
>> the kernels
>> in C. They really ought to do a Python DSL for that, but… So 
>> with D can
>> we write D kernels and have them compiled and loaded using a 
>> combination
>> of CTFE, D → C translation, C ompiler call, and other magic?
>>
>> Is this a GSoC 2015 type thing?
>>
>>
>> (*) It will be interesting to see how NVIDIA responds to the 
>> tack Intel
>> are taking on GPGPU and main memory access.
>
> https://github.com/HSAFoundation
>
> This is really the way to go, yea opencl and cuda exist, along 
> with opengl/directx compute shaders, but pretty much every 
> thing out their suffers from giant limitations.
>
> With HSA, HSAIL bytecode is embedded directly into the elf/exe 
> file, HASIL bytecode can can fully support all the features of 
> c++, virtual function lookups in code, access to the stack, 
> cache coherent memory access, the same virtual memory view as 
> the application it runs in, etc.
>
> HSA is implemented in the llvm backend compiler, and when it is 
> used in a elf/exe file, their is a llvm based finalizer that 
> generates gpu bytecode.
>
> More importantly, it should be very easy to implement in any 
> llvm supported language once all of the patches are moved up 
> stream to their respective libraries/toolsets.
>
> I believe that linux kernel 3.19 and above have the iommu 2.5 
> patches, and I think amd's radeon KFD driver made it into 3.20. 
> HSA will also be supported by ARM.
>
> HSA is generic enough, that assuming Intel implements similar 
> capabilities into their chips it otta be supportable their with 
> or without intels direct blessing.
>
> HSA does work with discrete gpu's and not just the embedded 
> stuff, And I believe that HSA can be used to accelerate OpenCL 
> 2.0, via copyless cache coherent memory access.

Java will support HSA as of Java 9 - 10, depending on project's 
progress.

http://openjdk.java.net/projects/sumatra/

https://wiki.openjdk.java.net/display/Sumatra/Main

--
Paulo