Scientific computing and parallel computing C++23/C++26
bcarneal at gmail.com
Thu Jan 20 09:18:29 UTC 2022
On Thursday, 20 January 2022 at 04:01:09 UTC, Araq wrote:
> On Thursday, 20 January 2022 at 00:43:30 UTC, Nicholas Wilson
>> I mean there are parametric attributes of the hardware, say
>> for example cache size (or available registers for GPUs), that
>> have a direct effect on how many times you can unroll the
>> inner loop, say for a windowing function, and you want to ship
>> optimised code for multiple configurations of hardware.
>> You can much more easily create multiple copies for different
>> sized cache (or register availability) in D than you can in
>> C++, because static foreach and static if >>> if constexpr.
> And you can do that even more easily with an AST macro system.
> Which Julia has...
Given this endorsement I started reading up on Julia/GPU... Here
are a few things that I found:
A gentle tutorial:
Another, more concise:
For those that are video oriented, here's a recent workshop:
While I admit to just skimming that, very long, video I was
impressed by the tooling on display and the friendly presentation.
In short, I found a lot to like about Julia from the above and
other writings but the material on Julia AST macros specifically
was ... underwhelming. AST macros look like an inferior tool in
this low level setting. They are slightly less readable to me
then the dcompute alternatives without offering any compensating
gain in performance.
More information about the Digitalmars-d