Scientific computing and parallel computing C++23/C++26

Sat Jan 15 12:21:37 UTC 2022

On Saturday, 15 January 2022 at 00:29:20 UTC, Nicholas Wilson 
wrote:
> As for manpower, the reason is I don't have any personal 
> particular need for dcompute these days. I am happy to do 
> features for people that need something in particular, e.g. 
> Vulkan compute shader, textures, and PR are welcome. Though if 
> Bruce makes millions and gives me a job then that will 
> obviously change ;)

He can put me on the application list as well… This sounds like 
lots of fun!!!

> important is latency vs. throughput? How "powerful" is the GPU 
> compared to the CPU?How well suited to the task is the GPU? The 
> list goes on. Its hard enough to do CPU benchmarks in an 
> unbiased way.

I don't think people would expect benchmarks to be unbiased. It 
could be 3-4 short benchmarks, some showcasing where it is 
beneficial, some showcasing where data dependencies (or other 
challenges) makes it less suitable.

E.g.
1. compute autocorrelation over many different lags
2. multiply and take the square root of two long arrays
3. compute a simple IIR filter (I assume a recursive filter would 
be a worst case?)

> If the intention is to say, "look at the speedup you can for 
> for $TASK using $COMMON_HARDWARE" then yeah, that would be 
> possible. It would certainly be possible to do a benchmark of, 
> say, "ease of implementation with comparable performance" of 
> dcopmute vs CUDA, e.g. LoC, verbosity, brittleness etc., since 
> the main advantage of D/dcompute (vs CUDA) is enumeration of 
> kernel designs for performance. That would give a nice 
> measurable goal to improve usability.

Yes, but I think of it as an inspiration with a tutorial of how 
to get the benchmarks to run. For instance, like you, I have no 
need for this at the moment and my current computer isn't really 
a good showcase of GPU computation either, but I have one long 
term hobby project where I might use GPU-computations eventually.

I suspect many think of GPU computations as something requiring a 
significant amount of time to get into. Even though they may be 
interested that threshold alone is enough to put it in the 
"interesting, but I'll look at it later" box.

If you can tease people into playing with it for fun, then I 
think there is a larger chance of them using it at a later stage 
(or even thinking about the possibility of using it) when they 
see a need in some heavy computational problem they are working 
on.

There is a lower threshold to get started with something new if 
you already have a tiny toy-project you can cut and paste from 
that you have written yourself.

Also, updated benchmarks could generate new interest on the 
announce forum thread. Lurking forum readers, probably only read 
them on occasion, so you have to make several posts to make 
people aware of it.

> Definitely. Homogenous memory is interesting for the ability to 
> make GPUs do the things GPUs are good at and leave the rest to 
> the CPU without worrying about memory transfer across the 
> PCI-e. Something which CUDA can't take advantage of on account 
> of nvidia GPUs being only discrete. I've no idea how cacheing 
> work in a system like that though.

I don't know, but Steam Deck, which appears to come out next 
month, seems to run under Linux and has an "AMD APU" with a 
modern GPU and CPU integrated on the same chip, at least that is 
what I've read. Maybe there will be more technical info available 
on how that works at the hardware level later, or maybe it is 
already on AMDs website?

If someone reading this thread has more info on this, it would be 
nice if they would share what they have found out! :-)