Race conditions in critical.c?

Tue Sep 30 12:13:56 PDT 2008

When I was rewriting monitor.c in D, I noticed the race in critical.c 
but decided not to work on it. Instead I've been trying to convince 
Walter to scrap critical.c altogether. Here's why:

How do you interpret the following fragment of code? (It's a simplified 
version of actual code written by a D programmer.)

class Foo {
    int i, j;
    void a() {
       synchronized {
          int x = ++i;
          j = x;
       }
    }
    void b() {
       synchronized {
          int x = --i;
          j = x;
       }
    }
}

Does it guarantee that i==j? If you say yes, you subscribe to one of the 
schools of thought that argue that "synchronized" in this context either 
means synchronized(this) or synchronized(this.classinfo), or 
synchronized(SomeHiddenGlobalSingleton).

Wrong! The code in critical.c creates a separate new critical section 
for every synchronized block of code. So a() and b() use _different_ 
locks and their execution may interleave.

If you think this is bizarre bordering on reckless I'm with you.

Mike Hearn wrote:
>>>
>>> I was reading the code to the phobos runtime library and saw the following code in critical.c:
>>>
>>> void _d_criticalenter(D_CRITICAL_SECTION *dcs)
>>> {
>>>     if (!dcs->next)
>>>     {
>>>         EnterCriticalSection(&critical_section.cs);
>>>         if (!dcs->next) // if, in the meantime, another thread didn't set it
>>>         {
>>>             dcs->next = dcs_list;
>>>                  <----   
>>>             dcs_list = dcs;      
>>>             InitializeCriticalSection(&dcs->cs);
>>>         }
>>>         LeaveCriticalSection(&critical_section.cs);
>>>     }
>>>     EnterCriticalSection(&dcs->cs);
>>> }
>>>
>>> There's a race condition here - consider a thread that is suspended at the marked point. Another thread then attempts to lock the critsec. It sees that dcs->next is set and calls EnterCriticalSection on an uninitialized CRITICAL_SECTION.
>>>
>>> The Linux code has the same problem, and then another one as well - the initialization of the "critsec lock" is itself lazy and is not protected by anything at all. The result is potentially multiple initializations of that lock, and because atexit does not coalesce multiple registrations, a potentially catastrophic multiple invocation of _STD_critical_term.
>>>
>>> It appears this code (for both platforms) needs to be scrapped and rewritten. I'd start by checking that lazy initialization of critsecs is actually needed - do D compilers really generate tons of potentially unnecessary D_CRITICAL_SECTIONs?
>>>
>>> thanks -mike
>