Bartosz Milewski Missing post

Thu May 28 11:02:57 PDT 2009

On Thu, 28 May 2009 13:36:28 -0400, Denis Koroskin <2korden at gmail.com>  
wrote:

> On Thu, 28 May 2009 21:07:57 +0400, Robert Jacques <sandford at jhu.edu>  
> wrote:
>
>> On Thu, 28 May 2009 12:45:41 -0400, Denis Koroskin <2korden at gmail.com>
>> wrote:
>>
>>> On Thu, 28 May 2009 20:32:29 +0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> BCS wrote:
>>>>> Everything is indicating that shared memory multi-threading is where
>>>>> it's all going.
>>>>
>>>> That is correct, just that it's 40 years late. Right now everything is
>>>> indicating that things are moving *away* from shared memory.
>>>>
>>>> Andrei
>>>
>>> That's true.
>>>
>>> For example, we develop for PS3, and its 7 SPU cores have 256KiB of TLS
>>> each (which is as fast as L2 cache) and no direct shared memory access.
>>> Shared memory needs to be requested via asynchronous memcpy requests,
>>> and this scheme doesn't work with OOP well: even after you transfer
>>> some object, its vtbl etc still point to shared memory.
>>>
>>> We had hard time re-arranging our data so that object and everything it
>>> owns (and points to) is stored sequencially in a single large block of
>>> memory.
>>> This also resulted in replacing most of the pointers with relative
>>> offsets.
>>>
>>> Parallelization is hard, but the result is worth the trouble.
>>
>> I agree that Andrei's right, but your example is wrong. The Cell's SPU
>> are a SIMD vector processors, not general CPUs. I also work with vector
>> processors (NVIDIA's CUDA) but every software/hardware iteration gets
>> further and further away from pure vector processing. Rumor has it that
>> the NVIDIA's next chip will be MIMD, instead of SIMD.
>
> I wanted to stress that multicore PUs tent to have their own local  
> memory (small but fast) and little or none global (shared) memory access  
> (it is not efficient and error prone - race condition et al.)
>
> I believe SIMD/MIMD discussion is irrelevant here. It's all about  
> Shared/Distributed Memory Model. MIMD devices can be both  
> (http://en.wikipedia.org/wiki/MIMD)

Well, I thought you were making a different point. Really, the Cell SPU is  
the only current PU with the design you're talking about. All commercial  
CPUs and GPUs have very large global memory buses.  Every blog and talk  
I've read/attended has painted the SPU in a very negative light, at least  
with regard to the programming model. (Which makes sense, since it's sorta  
like non-cache coherent NUMA, which pretty much all everyone decided is a  
bad idea.)