Scientific computing and parallel computing C++23/C++26
Bruce Carneal
bcarneal at gmail.com
Sat Jan 15 10:52:29 UTC 2022
On Saturday, 15 January 2022 at 09:03:11 UTC, Nicholas Wilson
wrote:
> On Saturday, 15 January 2022 at 08:01:15 UTC, Paulo Pinto wrote:
>> On Saturday, 15 January 2022 at 00:29:20 UTC, Nicholas Wilson
>> wrote:
>>> ....
>>>
>>> Definitely. Homogenous memory is interesting for the ability
>>> to make GPUs do the things GPUs are good at and leave the
>>> rest to the CPU without worrying about memory transfer across
>>> the PCI-e. Something which CUDA can't take advantage of on
>>> account of nvidia GPUs being only discrete. I've no idea how
>>> cacheing work in a system like that though.
>>> ...
>>
>> How is this different from unified memory?
>>
>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-unified-memory-programming-hd
>
> there still a PCI-e in-between. Fundamentally the memory must
> exist in either the CPUs RAM or the GPUs (V)RAM, from what I
> understand unified memory allows the GPU to access the host RAM
> with the same pointer. This reduces the total memory consumed
> by the program, but to get to the GPU the data must still cross
> the PCI-e.
Yes. You also gain some simplification, from unified memory, if
your data structures are pointer heavy.
I've tried to gain advantage from GPU-side pulls across the bus
in the past but could never win out over explicit async copying
utilizing dedicated copy circuitry. Others, particularly those
with high compute-to-load/store ratios, may have had better luck.
For reference, I've only been able to get a little over 80% of
the advertised PCI-e peak bandwidth out of the dedicated Nvidia
copy HW.
More information about the Digitalmars-d
mailing list