Scientific computing and parallel computing C++23/C++26
Bruce Carneal
bcarneal at gmail.com
Thu Jan 20 12:18:27 UTC 2022
On Thursday, 20 January 2022 at 08:36:32 UTC, Ola Fosheim Grøstad
wrote:
> On Thursday, 20 January 2022 at 08:20:58 UTC, Nicholas Wilson
> wrote:
>> Now you've confused me. You can select which implementation
>> to use at runtime with e.g. CPUID or more sophisticated
>> methods. LDC targeting DCompute can produce multiple objects
>> with the same compiler invocation, i.e. you can get CUDA for
>> any set of SM version, OpenCL compatible SPIR-V which you can
>> get per GPU, inspect its hardware characteristics and then
>> select which of your kernels to run.
>
> Yes, so why do you need compile time features?
Because compilers are not sufficiently advanced to extract all
the performance that is available on their own.
A good example of where the automated/simple approach was not
good enough is CUB (CUDA unbound), a high performance CUDA
library found here https://github.com/NVIDIA/cub/tree/main/cub
I'd recommend taking a look at the specializations that occur in
CUB in the name of performance.
D compile time features can help reduce this kind of mess, both
in extreme performance libraries and extreme performance code.
>
> My understanding is that the goal of nvc++ is to compile to CPU
> or GPU based on what pays of more for the actual code. So it
> will not need any annotations (it is up to the compiler to
> choose between CPU/GPU?). Bryce suggested that it currently
> only targets one specific GPU, but that it will target multiple
> GPUs for the same executable in the future.
>
> The goal for C++ parallelism is to make it fairly transparent
> to the programmer. Or did I misunderstand what he said?
I think that that is an entirely reasonable goal but such
transparency may cost performance and any such cost will be
unacceptable to some.
>
> My viewpoint is that if one are going to take a performance hit
> by not writing the shaders manually one need to get maximum
> convenience as a payoff.
>
> It should be an alternative for programmers that cannot afford
> to put in the extra time to support GPU compute manually.
Yes. Always good to have alternatives. Fully automated is one
option, hinted is a second alternative, meta-programming assisted
manual is a third.
>
>
>>> If you have to do the unrolling in D, then a lot of the
>>> advantage is lost and I might just as well write in a shader
>>> language...
>>
>> D can be your compute shading language for Vulkan and with a
>> bit of work whatever you'd use HLSL for, it can also be your
>> compute kernel language substituting for OpenCL and CUDA.
>
> I still don't understand why you would need static if/static
> for-loops? Seems to me that this is too hardwired, you'd be
> better off with compiler unrolling hints (C++ has these) if the
> compiler does the wrong thing.
If you can achieve your performance objectives with automated or
hinted solutions, great! But what if you can't? Most people
will not have to go as hardcore as the CUB authors did to get the
performance they need but I find myself wanting more than the
compiler can easily give me quite a bit. I'm very happy to have
the meta programming tools to factor/reduce these "manual"
programming task.
>
>
>> Same caveats apply for metal (should be pretty easy to do:
>> need Objective-C support in LDC, need Metal bindings).
>
> Use clang to compile the objective-c code to object files and
> link with it?
More information about the Digitalmars-d
mailing list