Scientific computing and parallel computing C++23/C++26
Ola Fosheim Grøstad
ola.fosheim.grostad at gmail.com
Thu Jan 20 13:29:26 UTC 2022
On Thursday, 20 January 2022 at 12:18:27 UTC, Bruce Carneal wrote:
> Because compilers are not sufficiently advanced to extract all
> the performance that is available on their own.
Well, but D developers cannot test on all available CPU/GPU
combinations either so then you don't know if SIMD would perform
better than GPU.
Something automated has to be present, at least on install,
otherwise you risk performance degradation compared to a pure
SIMD implementation. And then it is better (and cheaper) to just
avoid GPU altogether.
> A good example of where the automated/simple approach was not
> good enough is CUB (CUDA unbound), a high performance CUDA
> library found here https://github.com/NVIDIA/cub/tree/main/cub
> I'd recommend taking a look at the specializations that occur
> in CUB in the name of performance.
I am sure you are right, but I didn't find anything special when
I browsed through the repo?
> If you can achieve your performance objectives with automated
> or hinted solutions, great! But what if you can't?
Well, my gut instinct is that if you want maximal performance for
a specific GPU then you would be better off using
But I have no experience with that as it is quite time consuming
to go that route. Right now basic SIMD is time consuming enough…
More information about the Digitalmars-d