GPGPUs
John Colvin
john.loughran.colvin at gmail.com
Fri Aug 16 13:07:31 PDT 2013
On Friday, 16 August 2013 at 19:55:56 UTC, luminousone wrote:
>> The core (!) point here is that processor chips are rapidly
>> becoming a
>> collection of heterogeneous cores. Any programming language
>> that assumes
>> a single CPU or a collection of homogeneous CPUs has built-in
>> obsolescence.
>>
>> So the question I am interested in is whether D is the
>> language that can
>> allow me to express in a single codebase a program in which
>> parts will
>> be executed on one or more GPGPUs and parts on multiple CPUs.
>> D has
>> support for the latter, std.parallelism and std.concurrency.
>>
>> I guess my question is whether people are interested in
>> std.gpgpu (or
>> some more sane name).
>
> CUDA, works as a preprocessor pass that generates c files from
> .cu extension files.
>
> In effect, to create a sensible environment for microthreaded
> programming, they extend the language.
>
> a basic CUDA function looking something like...
>
> __global__ void add( float * a, float * b, float * c) {
> int i = threadIdx.x;
> c[i] = a[i] + b[i];
> }
>
> add<<< 1, 10 >>>( ptrA, ptrB, ptrC );
>
> Their is the buildin variables to handle the index location
> threadIdx.x in the above example, this is something generated
> by the thread scheduler in the video card/apu device.
>
> Generally calls to this setup has a very high latency, so using
> this for a small handful of items as in the above example makes
> no sense. In the above example that would end up using a single
> execution cluster, and leave you prey to the latency of the
> pcie bus, execution time, and latency costs of the video memory.
>
> it doesn't get effective until you are working with large data
> sets, that can take advantage of a massive number of threads
> where the latency problems would be secondary to the sheer
> calculations done.
>
> as far as D goes, we really only have one build in
> microthreading capable language construct, foreach.
>
> However I don't think a library extension similar to
> std.parallelism would work gpu based microthreading.
>
> foreach would need to have something to tell the compiler to
> generate gpu bytecode for the code block it uses, and would
> need instructions on when to use said code block based on
> dataset size.
>
> while it is completely possible to have very little change with
> function just add new property @microthreaded and the build in
> variables for the index position/s, the calling syntax would
> need changes to support a work range or multidimensional range
> of some sort.
>
> perhaps looking something like....
>
> add$(1 .. 10)(ptrA,ptrB,ptrC);
>
> a templated function looking similar
>
> add!(float)$(1 .. 10)(ptrA,ptrB,ptrC);
We have a[] = b[] * c[] - 5; etc. which could work very neatly
perhaps?
More information about the Digitalmars-d
mailing list