Scientific computing and parallel computing C++23/C++26

Thu Jan 20 09:18:29 UTC 2022

On Thursday, 20 January 2022 at 04:01:09 UTC, Araq wrote:
> On Thursday, 20 January 2022 at 00:43:30 UTC, Nicholas Wilson 
> wrote:
>
>> I mean there are parametric attributes of the hardware, say 
>> for example cache size (or available registers for GPUs), that 
>> have a direct effect on how many times you can unroll the 
>> inner loop, say for a windowing function, and you want to ship 
>> optimised  code for multiple configurations of hardware.
>>
>> You can much more easily create multiple copies for different 
>> sized cache (or register availability) in D than you can in 
>> C++, because static foreach and static if >>> if constexpr.
>
> And you can do that even more easily with an AST macro system. 
> Which Julia has...

Given this endorsement I started reading up on Julia/GPU...  Here 
are a few things that I found:
A gentle tutorial: 
https://nextjournal.com/sdanisch/julia-gpu-programming
Another, more concise: 
https://juliagpu.gitlab.io/CUDA.jl/tutorials/introduction/

For those that are video oriented, here's a recent workshop:
https://www.youtube.com/watch?v=Hz9IMJuW5hU

While I admit to just skimming that, very long, video I was 
impressed by the tooling on display and the friendly presentation.

In short, I found a lot to like about Julia from the above and 
other writings but the material on Julia AST macros specifically 
was ...  underwhelming.  AST macros look like an inferior tool in 
this low level setting.  They are slightly less readable to me 
then the dcompute alternatives without offering any compensating 
gain in performance.