Scientific computing and parallel computing C++23/C++26

Sat Jan 15 18:48:32 UTC 2022

On Saturday, 15 January 2022 at 17:29:35 UTC, Guillaume Piolat 
wrote:
> On Saturday, 15 January 2022 at 12:21:37 UTC, Ola Fosheim 
> Grøstad wrote:
>>
>>> Definitely. Homogenous memory is interesting for the ability 
>>> to make GPUs do the things GPUs are good at and leave the 
>>> rest to the CPU without worrying about memory transfer across 
>>> the PCI-e. Something which CUDA can't take advantage of on 
>>> account of nvidia GPUs being only discrete.
>>
>> Steam Deck, which appears to come out next month, seems to run 
>> under Linux and has an "AMD APU" with a modern GPU and CPU 
>> integrated on the same chip
>
> Related: has anyone here seen an actual measured performance 
> gain from co-located CPU and GPU on the same chip? I used to 
> test with OpenCL + Intel SoC and again, it was underwhelming 
> and not faster. I'd be happy to know about other experiences.

The link below on the vkpolybench software includes graphs for 
integrated GPUs, among others, and shows significant (more than 
SIMD width) speedups wrt a single CPU core for many of the 
benchmarks but also break-even or worse on a few.  Reports on 
real world experiences with the integrated accelerators would be 
better.

https://github.com/ElsevierSoftwareX/SOFTX_2020_86

On paper, at least, it looks like SoC GPU performance will be 
severely impacted by the working set size but who isn't?. 
Currently it also looks like the dcompute/SoC-GPU version will 
beat out my SIMD variant but it'll be at least a few months 
before I have hard data to share.

Anyone out there have real world data now?