Scientific computing and parallel computing C++23/C++26
Ola Fosheim Grøstad
ola.fosheim.grostad at gmail.com
Thu Jan 20 06:57:28 UTC 2022
On Thursday, 20 January 2022 at 00:43:30 UTC, Nicholas Wilson
> I mean there are parametric attributes of the hardware, say for
> example cache size (or available registers for GPUs), that have
> a direct effect on how many times you can unroll the inner
> loop, say for a windowing function, and you want to ship
> optimised code for multiple configurations of hardware.
> You can much more easily create multiple copies for different
> sized cache (or register availability) in D than you can in
> C++, because static foreach and static if >>> if constexpr.
Hmm, I dont understand, the unrolling should happen at runtime so
that you can target all GPUs with one executable?
If you have to do the unrolling in D, then a lot of the advantage
is lost and I might just as well write in a shader language...
More information about the Digitalmars-d