Scientific computing and parallel computing C++23/C++26

Sat Jan 15 00:29:20 UTC 2022

On Friday, 14 January 2022 at 15:17:59 UTC, Ola Fosheim Grøstad 
wrote:
> **\*nods**\* For a long time we could expect "home computers" 
> to be Intel/AMD, but then the computing environment changed and 
> maybe Apple tries to make its own platform stand out as faster 
> than it is by forcing developers to special case their code for 
> Metal rather than going through a generic API.
>
> I guess FPGAs will be available in entry level machines at some 
> point as well. So, I understand that it will be a challenge to 
> get *dcompute* to a "ready for the public" stage when there is 
> no multi-person team behind it.

Maybe, but I suspect not for a while though, but that could be 
wildly wrong. Anyway, I don't think they will be too difficult to 
support, provided the vendor in question provides an OpenCL 
implementation. The only thing to do is support `pipe`s.

As for manpower, the reason is I don't have any personal 
particular need for dcompute these days. I am happy to do 
features for people that need something in particular, e.g. 
Vulkan compute shader, textures, and PR are welcome. Though if 
Bruce makes millions and gives me a job then that will obviously 
change ;)

> But I am not so sure about the apples and oranges aspect of it.

The apples to oranges comment was about doing benchmarks with CPU 
vs. GPU, there are so many factors that make performance 
comparisons (more) difficult. Is the GPU discrete? How important 
is latency vs. throughput? How "powerful" is the GPU compared to 
the CPU?How well suited to the task is the GPU? The list goes on. 
Its hard enough to do CPU benchmarks in an unbiased way.

If the intention is to say, "look at the speedup you can for for 
$TASK using $COMMON_HARDWARE" then yeah, that would be possible. 
It would certainly be possible to do a benchmark of, say, "ease 
of implementation with comparable performance" of dcopmute vs 
CUDA, e.g. LoC, verbosity, brittleness etc., since the main 
advantage of D/dcompute (vs CUDA) is enumeration of kernel 
designs for performance. That would give a nice measurable goal 
to improve usability.

> The presentation by Bryce was quite explicitly focusing on 
> making GPU computation available at the same level as CPU 
> computations (sans function pointers). This should be possible 
> for homogeneous memory systems (GPU and CPU sharing the same 
> memory bus) in a rather transparent manner and languages that 
> plan for this might be perceived as being much more productive 
> and performant if/when this becomes reality. And C++23 isn't 
> far away, if they make the deadline.

Definitely. Homogenous memory is interesting for the ability to 
make GPUs do the things GPUs are good at and leave the rest to 
the CPU without worrying about memory transfer across the PCI-e. 
Something which CUDA can't take advantage of on account of nvidia 
GPUs being only discrete. I've no idea how cacheing work in a 
system like that though.

> It was also interesting to me that ISO C23 will provide custom 
> bit width integers and that this would make it easier to 
> efficiently compile C-code to tighter FPGA logic. I remember 
> that LLVM used to have that in their IR, but I think it was 
> taken out and limited to more conventional bit sizes?

Arbitrary Precision integers are still a part of LLVM, and I 
presume LLVM IR. the problem with that is, like with addressed 
spaced pointers, D has no way to declare such types. I seem to 
remember Luís Marqeus doing something crazy like that (maybe in a 
dconf presentation?), compiling D to verilog.

> It just  shows that being a system-level programming language 
> requires a lot of adaptability over time and frameworks like 
> *dcompute* cannot ever be considered truly finished.

Of course.