GPGPUs
luminousone
rd.hunt at gmail.com
Fri Aug 16 12:55:54 PDT 2013
> The core (!) point here is that processor chips are rapidly
> becoming a
> collection of heterogeneous cores. Any programming language
> that assumes
> a single CPU or a collection of homogeneous CPUs has built-in
> obsolescence.
>
> So the question I am interested in is whether D is the language
> that can
> allow me to express in a single codebase a program in which
> parts will
> be executed on one or more GPGPUs and parts on multiple CPUs. D
> has
> support for the latter, std.parallelism and std.concurrency.
>
> I guess my question is whether people are interested in
> std.gpgpu (or
> some more sane name).
CUDA, works as a preprocessor pass that generates c files from
.cu extension files.
In effect, to create a sensible environment for microthreaded
programming, they extend the language.
a basic CUDA function looking something like...
__global__ void add( float * a, float * b, float * c) {
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
add<<< 1, 10 >>>( ptrA, ptrB, ptrC );
Their is the buildin variables to handle the index location
threadIdx.x in the above example, this is something generated by
the thread scheduler in the video card/apu device.
Generally calls to this setup has a very high latency, so using
this for a small handful of items as in the above example makes
no sense. In the above example that would end up using a single
execution cluster, and leave you prey to the latency of the pcie
bus, execution time, and latency costs of the video memory.
it doesn't get effective until you are working with large data
sets, that can take advantage of a massive number of threads
where the latency problems would be secondary to the sheer
calculations done.
as far as D goes, we really only have one build in microthreading
capable language construct, foreach.
However I don't think a library extension similar to
std.parallelism would work gpu based microthreading.
foreach would need to have something to tell the compiler to
generate gpu bytecode for the code block it uses, and would need
instructions on when to use said code block based on dataset size.
while it is completely possible to have very little change with
function just add new property @microthreaded and the build in
variables for the index position/s, the calling syntax would need
changes to support a work range or multidimensional range of some
sort.
perhaps looking something like....
add$(1 .. 10)(ptrA,ptrB,ptrC);
a templated function looking similar
add!(float)$(1 .. 10)(ptrA,ptrB,ptrC);
More information about the Digitalmars-d
mailing list