WordCount performance

Fri Mar 28 17:05:44 PDT 2008

Walter Bright wrote:
> Sean Kelly wrote:
>> Another option might be to check the thread count is greater than
>> 1 and only lock if it is.  Tango has a routine called thread_needLock
>> for this purpose, though it goes a bit farther and is true once a
>> thread has been created through program termination.  This avoids
>> problems with the situation where you have multiple threads running
>> and all but one terminate but memory is not yet synchronized.
> 
> You have to be very careful with such schemes because they can run afoul 
> of the notoriously difficult to comprehend "double checked locking" bug.

In this case though, if the thread count was 1 when checked it's safe to 
proceed without locking as long as you don't spawn any new threads. If 
it wasn't when checking, but became 1 afterwards it doesn't really 
matter; you may not have needed to lock it but it'll work fine if you 
do. (Just make sure the code path that locks also (unconditionally) 
unlocks, and of course that the code path that doesn't lock doesn't try 
to unlock either[1])

Oh, and the Phobos gc also has a thread_needLock() (in 
phobos/internal/gc/gcx.d).

[1]
---
if (!thread_needLock()) {
     foo();
} else synchronized(X) {
     foo();
}
---
does that nicely, and is what the GC uses. (Both in Phobos and Tango)

Hmm, reading the relevant parts of the GC code just now, something 
occurred to me. Some background first:
If allocation triggers an actual collection it checks thread_needLock() 
again, and locks for the duration of the collection if there's only one 
thread. The comment explains this is done because finalizers may start 
new threads.

However, the lock is then released *before* using the newly-collected 
heap to perform the actual allocation. That makes me wonder what happens 
if the first thing such a new thread does is allocating some memory...

In other words:
1) The only thread starts an allocation, determining the lock is not needed.
2) There's no space to allocate, and the GC prepares for a collection.
3) The GC notices the number of threads is 1, and acquires the lock, and 
starts performing the collection.
4) A finalizer starts a new thread, which attempts to allocate and 
blocks on the GC lock held by the collector.
5) The original thread finishes the collection and *releases the lock*.
6) It then determines what memory location to return from 'new'.
7) Thread switch, the second thread acquires the GC lock (which is no 
longer held by the first thread) and starts its own allocation 
activities. Since the original thread didn't yet mark its chosen memory 
as allocated, the second thread picks the same memory and marks it as 
allocated.
8) Thread switch, the first thread finishes its allocation by marking 
the memory as allocated. (even though it already was marked by the 
second thread)
9) Both threads start using the same piece of memory as if they have the 
only reference to it (which should be true from either perspective, 
since it was just returned from 'new').
10) *KABOOM*

(I may have just proven your point about these things being very hard to 
get right...)