GPGPUs

Atash nope at nope.nope
Fri Aug 16 14:14:11 PDT 2013


On Friday, 16 August 2013 at 19:55:56 UTC, luminousone wrote:
>> The core (!) point here is that processor chips are rapidly 
>> becoming a
>> collection of heterogeneous cores. Any programming language 
>> that assumes
>> a single CPU or a collection of homogeneous CPUs has built-in
>> obsolescence.
>>
>> So the question I am interested in is whether D is the 
>> language that can
>> allow me to express in a single codebase a program in which 
>> parts will
>> be executed on one or more GPGPUs and parts on multiple CPUs. 
>> D has
>> support for the latter, std.parallelism and std.concurrency.
>>
>> I guess my question is whether people are interested in 
>> std.gpgpu (or
>> some more sane name).
>
> CUDA, works as a preprocessor pass that generates c files from 
> .cu extension files.
>
> In effect, to create a sensible environment for microthreaded 
> programming, they extend the language.
>
> a basic CUDA function looking something like...
>
> __global__ void add( float * a, float * b, float * c) {
>    int i = threadIdx.x;
>    c[i] = a[i] + b[i];
> }
>
> add<<< 1, 10 >>>( ptrA, ptrB, ptrC );
>
> Their is the buildin variables to handle the index location 
> threadIdx.x in the above example, this is something generated 
> by the thread scheduler in the video card/apu device.
>
> Generally calls to this setup has a very high latency, so using 
> this for a small handful of items as in the above example makes 
> no sense. In the above example that would end up using a single 
> execution cluster, and leave you prey to the latency of the 
> pcie bus, execution time, and latency costs of the video memory.
>
> it doesn't get effective until you are working with large data 
> sets, that can take advantage of a massive number of threads 
> where the latency problems would be secondary to the sheer 
> calculations done.
>
> as far as D goes, we really only have one build in 
> microthreading capable language construct, foreach.
>
> However I don't think a library extension similar to 
> std.parallelism would work gpu based microthreading.
>
> foreach would need to have something to tell the compiler to 
> generate gpu bytecode for the code block it uses, and would 
> need instructions on when to use said code block based on 
> dataset size.
>
> while it is completely possible to have very little change with 
> function just add new property @microthreaded and the build in 
> variables for the index position/s, the calling syntax would 
> need changes to support a work range or multidimensional range 
> of some sort.
>
> perhaps looking something like....
>
> add$(1 .. 10)(ptrA,ptrB,ptrC);
>
> a templated function looking similar
>
> add!(float)$(1 .. 10)(ptrA,ptrB,ptrC);

Regarding functionality, @microthreaded is sounding a lot like 
the __kernel or __global__ keywords in OpenCL and CUDA. Is this 
intentional?

The more metaphors that can be drawn between extant tools and 
whatever is come up with the better, methinks.


More information about the Digitalmars-d mailing list