D and Heterogeneous Computing

Tue Apr 10 14:58:27 PDT 2012

On 11.04.2012 0:31, Josh Klontz wrote:
>> IIRC, doesn't OpenCL support jit-ing ASCII source files? Then, there
>> wouldn't be a need for any language changes.
>
> Correct, and that's the underlying power I'm proposing to
> leverage.
>
> IMO, writing OpenCL code involves (at least) the following
> nuisances:
> 1) The kernel code needs to be written as a text string within
> the native code base.
> 2) Various function calls to the OpenCL library need to be made
> to manage the runtime, compile kernels, connect arguments to
> kernels, execute the kernels, and retrieve the results.
> 3) If you want to build an application both with and without
> OpenCL as the backend then you have to maintain two versions of
> every algorithm, one as an OpenCL string and the other in the
> native language of your program.
>
> To me there seems to be a huge opportunity to obviate the above
> issues and entice new developers to D via some careful
> engineering at either the compiler or the standard library level
> to support heterogeneous computing. Certainly technologies like
> C++ AMP are a step in the right direction, but to my knowledge
> there currently doesn't exist anything with the following
> desirable principles:
> 1) Write the algorithm once, compile for both serial execution on
> the CPU or massively parallel execution on an OpenCL enabled
> device.
> 2) FOSS
> 3) Runs everywhere the underlying language runs.
> 4) The underlying language has a robust compiler, active and
> growing community, solid standard library, elegant language
> features, etc...
>
> Perhaps I was wrong to suggest that this has to be solved at the
> compiler level. The EPGPU library seems to tackle some of the
> problems of mixing OpenCL kernels within C++, though the syntax
> is far from ideal.
>
> Thoughts?

 From the looks of it this kind of stuff should be easy with tokenzied 
strings ( q{ code } )+ mixins + some "auto-magic" helpers being run for 
OpenCL behind the covers. The problematic part is checking that the 
fragment is using the correct subset of both languages.

Ideally API should work along the lines of this:

float[] arr1, arr2;
//init arr1 & arr2
assert(arr1.length == arr2.length);
length = arr1.length;
compute!q{
	for(int i=0;i<length; i++)
		arr1[i] += arr2[i];
}(arr1, arr2);

where compute works both with plain CPU and even without OpenCL (by 
simply mixin stuff in) and for OpenCL with a bit of extra binding magic 
inside compute template.

(compute is an eponymous template that alied to static function inside, 
that in turn is generated by mixin, for concrete example - take a look 
on how ctRegex template in std.regex does it)

Of course, there are some painful details when you go for deeper things 
and error messages but it should be perfectly doable in normal D even 
w/o say CTFE parser.

-- 
Dmitry Olshansky