I think race condition exists in tango & phobos gc code
redsea
redsea at 163.com
Sun Sep 7 01:50:34 PDT 2008
I have a programm wrote in D and run 24 * 7, I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? )
and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ? I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask.
Then I check the lib, and I think the problem may cause by the following execute order:
thread A: thread B:
fullcollect
thread_suspendAll
suspend
thread_suspendHandler
sem_post( &suspendCount );
ret from sem_wait( &suspendCount );
do collect
thread_resumeAll
!! this signal would lost
pthread_kill( t.m_addr, SIGUSR2 )
sigsuspend( &sigres );
thread B would block because of the SIGUSR2 lost.
then I check the phobos code, and the code is alike.
Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.
Any suggest ?
More information about the Digitalmars-d
mailing list