Unofficial wish list status.(Jul 2008)

Fri Jul 4 09:40:13 PDT 2008

== Quote from Me Here (p9e883002 at sneakemail.com)'s article
> superdan wrote:
> > Oskar Linde Wrote:
> >
> > > superdan wrote:
> > > > Me Here Wrote:
> > > >
> > > >> Walter Bright wrote:
> > > > >
> > > >>> Yes, but the onus will be on you (the programmer) to prevent data races
> > > and >>> do proper synchronization.
> > > >> In the scenario described, the main thread initialises the array of
> > > data. Then, >> non-overlapping slices of that are tioned out to N worker
> > > threads. Only one >> thread ever modifies any given segment. When the
> > > worker threads are complete, >> the 'results' are left in the original
> > > array available in its entirety only to >> the main thread.
> > > > >
> > > >>> You have to be very wary of cache effects when
> > > >>> writing data in one thread and expecting to see it in another.
> > > >> Are you saying that there is some combination of OS and/or hardware L1/L2
> > > >> caching that would allow one thread to read a memory location
> > > (previously) >> modified by another thread, and see 'old data'?
> > > > >
> > > >> Cos if you are, its a deeply serious bug that if its not already very
> > > well >> documented by the OS writer or hardware manufacturers, then here's
> > > your chance >> to get slashdotted (and diggited and redited etc. all
> > > concurrently) as the >> discoveerer of a fatel processor flaw.
> > > >
> > > > google for "relaxed memory consistency model" or "memory barriers". geez.
> > >
> > > I presume the discussion regards symmetric multiprocessing (SMP).
> > >
> > > Cache coherency is a very important element of any SMP design. It
> > > basically means that caches should be fully transparent, i.e. the
> > > behavior should not change by the addition or removal of caches.
> >
> > you are perfectly correct... as of ten years ago. you are right in that cache
> > coherency protocols ensure the memory model is respected regardless of adding
> > or eliminating caches. (i should know coz i implemented a couple for a
> > simulator.) the problem is that the memory model has been aggressively
> > changed recently towards providing less and less implied ordering and
> > requiring programs to write explicit synchronization directives.
> >
> > > So the above scenario should never occur. If thread A writes something
> > > prior to thread B reading it, B should never get the old value.
> >
> > yeah the problem is it's hard to define what "prior" means.
> >
> > > "Memory barriers" have nothing to do with cache consistency. A memory
> > > barrier only prevents a single CPU thread from reordering load/store
> > > instructions across that specific barrier.
> >
> > memory barriers strengthen the relaxed memory model that was pushed
> > aggressively by the need for faster caches.
> Since in the scenario I describe, Each thread or cpu is dealing with a single
> section of memory. And each section of memory is being dealt with by a single
> thread or cpu, the is effectively no shared state whilst the threads run, Hence
> no possibility of cache inconsistancy due to pipeline reordering. Ie.
> main thread populates a[ 0 .. 1000 ];
> for thread 1 .. 10
>     spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );
> main thread waits for all threads to terminate;
> main thread does something with a[];
> In any case, cache consistancy issues due to pipeline reordering do not survive
> context switches, so the issue is a non-issue for the purposes of the
> discussion at hand. Ie. threading

Multithreading with a single-CPU machine is always fairly safe and predictable
because all threads share the same cache, etc.  Even most popular multicore
machines today are relatively safe because in most instances the cores share
at least the L2+ caches, sidestepping many typical SMP issues.  But multiple
CPUs in a machine introduce an entirely new set of issues and it's these that
concurrent programmers must consider.  For example, here's one fun issue
that can occur with PC, which is what the IA-32 (ie. x86) was thought to
follow:

x = y = 0;

// thread A
x = 1;

// thread B
if( x == 1 )
    y = 1;

// thread B
if( y == 1 )
    assert( x == 1 ); // may fail

The issue with PC described above is that while each CPU observes the actions
of another CPU in a specific order, all CPUs are not guaranteed to observe the
actions of other CPUs simultaneously.  So it's possible that thread B may observe
thread A's store of 1 to x before thread B sees the same store.

Fortunately, Intel has recently gotten a lot more proactive about facilitating SMP,
and during the C++0x memory model discussions it was verified that the above
behavior will in fact not occur on current Intel architectures.  But there are a lot
of weird little issues like this that can lead to surprising behavior, even on an
architecture with a fairly strong memory model.

> Pipelines cover single digit or low double digit runs of non-branching
> instructsion at most. A context switch consists of hundreds if not thousands of
> instructions on all but the most highly tuned of real-time kernels. This is a
> very localised issue, for the compiler writer, not the application programmer
> to worry about.
> I know Walter *is* a compiler writer, but this is a complete red-herring in the
> context of this discussion.

As above, once there is more than one CPU in a box then one may no longer
rely on context switching to provide a convenient "quiescent state," so I think
that you're providing false assurances here.

Sean