I think race condition exists in tango & phobos gc code

redsea redsea at 163.com
Sun Sep 7 01:50:34 PDT 2008


I have a programm wrote in D and run 24 * 7,  I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? )

and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ?  I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask.  

Then I check the lib, and I think the problem may cause by the following execute  order:

   thread A:                                              thread B:     
   
   fullcollect 
      thread_suspendAll
          suspend                                 
                                                               thread_suspendHandler
                                                               sem_post( &suspendCount );

               ret from sem_wait( &suspendCount );   
      do collect
      
      thread_resumeAll
               !! this signal would lost
               pthread_kill( t.m_addr, SIGUSR2 )
                                                              
                                                               sigsuspend( &sigres );         

thread B would block because of the SIGUSR2 lost.

then I check the phobos code, and the code is alike.

Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.


Any suggest ?




More information about the Digitalmars-d mailing list