DCompute - Native heterogeneous computing for D - is here!

Mon Feb 27 16:22:17 PST 2017

On Sunday, 26 February 2017 at 08:37:29 UTC, Nicholas Wilson 
wrote:
> DCompute is an extension to LDC capable of generating code 
> (with no language changes*) for NVIDIA's NVPTX for use with 
> CUDA, SPIRV for use with the OpenCL runtime, and of course the 
> host, all at the same time! It is also possible to share 
> implementation of algorithms across the host and device.
> This will enable writing kernels in D utilising all of D's meta 
> programming goodness across the device divide and will allow 
> launching those kernels with a level of ease on par with CUDA's 
> <<<...>>> syntax. I hope to be giving a talk at DConf2017 about 
> this ;), what it enables us to do, what still needs to be done 
> and future plans.
>
> DCompute supports all of OpenCL except Images and Pipes 
> (support is planned though).
> I haven't done any test for CUDA so I'm not sure about the 
> extent of support for it, all of the math stuff works, 
> images/textures not so sure.
>
> Many thanks to the ldc team (especially Johan) for their 
> guidance and patience, Ilya for reminding me that I should 
> upstream my work and John Colvin for his DConf2016 talk for 
> making me think 'surely compiler support can't be too hard'. 10 
> months later: here it is!
>
> The DCompute compiler is available at the dcompute branch of 
> ldc [0], you will need my fork of llvm here[1] and the SPIRV 
> submodule that comes with it [2] as the llvm to link against. 
> There is also a tool for interconversion [3] (I've mucked up 
> the submodules a bit, sorry, just clone it into 
> 'tools/llvm-spirv', it's not necessary anyway). The device 
> standard library and drivers (both WIP) are available here[4].
>
> Please sent bug reports to their respective components, 
> although I'm sure I'll see them anyway regardless of where they 
> go.
>
> [0]: https://github.com/ldc-developers/ldc/tree/dcompute
> [1]: https://github.com/thewilsonator/llvm/tree/compute
> [2]: https://github.com/thewilsonator/llvm-target-spirv
> [3]: https://github.com/thewilsonator/llvm-tool-spirv
> [4]: https://github.com/libmir/dcompute
>
> * modulo one hack related to resolving intrinsics because there 
> is no static context (i.e. static if) for the device(s). 
> Basically a 'codegen time if'.

An simple example because I forgot.

```
@compute(CompileFor.deviceOnly) module example;
import ldc.attributes;
import ldc.dcomputetypes;
import dcompute.std.index;

@kernel void test(GlobalPointer!float a, GlobalPointer!float b)
{
     auto idx = GlobalIndex.x;
     a[idx] = a[idx] + b[idx];
}
```

then compile with `ldc -mdcompute-targets=ocl-220,cuda-500 
example.d -I/path/to/dcompute`. It will produce two files, 
kernels_ocl220_64.spv and kernels_cuda500_64.ptx when built in 
64-bit mode and kernels_ocl220_32.spv and kernels_cuda500_32.ptx 
in 32 bit mode.