Scientific computing and parallel computing C++23/C++26

Thu Jan 20 06:57:28 UTC 2022

On Thursday, 20 January 2022 at 00:43:30 UTC, Nicholas Wilson 
wrote:
> I mean there are parametric attributes of the hardware, say for 
> example cache size (or available registers for GPUs), that have 
> a direct effect on how many times you can unroll the inner 
> loop, say for a windowing function, and you want to ship 
> optimised  code for multiple configurations of hardware.
>
> You can much more easily create multiple copies for different 
> sized cache (or register availability) in D than you can in 
> C++, because static foreach and static if >>> if constexpr.

Hmm, I dont understand, the unrolling should happen at runtime so 
that you can target all GPUs with one executable?

If you have to do the unrolling in D, then a lot of the advantage 
is lost and I might just as well write in a shader language...