D and GPGPU
luminousone via Digitalmars-d
digitalmars-d at puremagic.com
Wed Feb 18 10:14:17 PST 2015
On Wednesday, 18 February 2015 at 15:15:21 UTC, Russel Winder
wrote:
> It strikes me that D really ought to be able to work with GPGPU
> – is
> there already something and I just failed to notice. This is
> data
> parallelism but of a slightly different sort to that in
> std.parallelism.
> std.concurrent, std.parallelism, std.gpgpu ought to be
> harmonious
> though.
>
> The issue is to create a GPGPU kernel (usually C code with
> bizarre data
> structures and calling conventions) set it running and then
> pipe data in
> and collect data out – currently very slow but the next
> generation of
> Intel chips will fix this (*). And then there is the
> OpenCL/CUDA debate.
>
> Personally I think OpenCL, for all it's deficiencies, as it is
> vendor
> neutral. CUDA binds you to NVIDIA. Anyway there is an NVIDIA
> back end
> for OpenCL. With a system like PyOpenCL, the infrastructure
> data and
> process handling is abstracted, but you still have to write the
> kernels
> in C. They really ought to do a Python DSL for that, but… So
> with D can
> we write D kernels and have them compiled and loaded using a
> combination
> of CTFE, D → C translation, C ompiler call, and other magic?
>
> Is this a GSoC 2015 type thing?
>
>
> (*) It will be interesting to see how NVIDIA responds to the
> tack Intel
> are taking on GPGPU and main memory access.
https://github.com/HSAFoundation
This is really the way to go, yea opencl and cuda exist, along
with opengl/directx compute shaders, but pretty much every thing
out their suffers from giant limitations.
With HSA, HSAIL bytecode is embedded directly into the elf/exe
file, HASIL bytecode can can fully support all the features of
c++, virtual function lookups in code, access to the stack, cache
coherent memory access, the same virtual memory view as the
application it runs in, etc.
HSA is implemented in the llvm backend compiler, and when it is
used in a elf/exe file, their is a llvm based finalizer that
generates gpu bytecode.
More importantly, it should be very easy to implement in any llvm
supported language once all of the patches are moved up stream to
their respective libraries/toolsets.
I believe that linux kernel 3.19 and above have the iommu 2.5
patches, and I think amd's radeon KFD driver made it into 3.20.
HSA will also be supported by ARM.
HSA is generic enough, that assuming Intel implements similar
capabilities into their chips it otta be supportable their with
or without intels direct blessing.
HSA does work with discrete gpu's and not just the embedded
stuff, And I believe that HSA can be used to accelerate OpenCL
2.0, via copyless cache coherent memory access.
More information about the Digitalmars-d
mailing list