Scientific computing and parallel computing C++23/C++26

Thu Jan 20 00:43:30 UTC 2022

On Wednesday, 19 January 2022 at 10:17:45 UTC, Ola Fosheim 
Grøstad wrote:
> On Wednesday, 19 January 2022 at 09:49:59 UTC, Nicholas Wilson 
> wrote:
>> Arguably that already describes Nvidia. Luckily for us, it has 
>> an intermediate layer in PTX that LLVM can target, and that's 
>> exactly what dcompute does.
>
> For desktop applications one has to support Intel, AMD, Nvidia, 
> Apple. So, does that mean that one have to support Metal, 
> Vulkan, PTX and RocM? Sounds like too much…

That was a comment mostly about the market share and "business 
practices" Nvidia.

Intel is well supported by OpenCL/SPIR-V.

There are some murmurings that AMD is getting SPIR-V support for 
ROCm, though if that is insufficient, I don't think it would be 
too difficult to hook the AMDGPU backend to LDC+DCompute (runtime 
libraries would be a bit tedious, given the lack of familiarity 
and volume of code), but I have no hardware to run ROCm math the 
moment.

Metal should also not be too difficult (the kernel argument 
format is different which is annoying) to hook LDC up to, the 
main thing lacking is Objective-C support to bind the runtime 
libraries for DCompute (which would also need to be written.

LDC can already target Vulkan compute (although the pipeline is 
tedious, and there is no runtime library support).

>> Unlike C++, D can much more easily statically condition on 
>> aspects of the hardware, making the tuning process faster to 
>> navigate the parameter configuration space.
>
> Not sure what you meant here?

I mean there are parametric attributes of the hardware, say for 
example cache size (or available registers for GPUs), that have a 
direct effect on how many times you can unroll the inner loop, 
say for a windowing function, and you want to ship optimised  
code for multiple configurations of hardware.

You can much more easily create multiple copies for different 
sized cache (or register availability) in D than you can in C++, 
because static foreach and static if >>> if constexpr.