I think race condition exists in tango & phobos gc code

Sean Kelly sean at invisibleduck.org
Mon Sep 8 07:50:23 PDT 2008


redsea wrote:
> I have a programm wrote in D and run 24 * 7,  I found it would block one time or twice a week (without using CPU load), whenever I use strace to check if if block at system all, it continue run (strange ? )
> 
> and I can resume it use kill -SIGUSR2, so I think this situation may associated with gc. But why strace ?  I check the strace code, and found it would cause SIGSTOP to send, and I found SIGSTOP can not block by signal mask.  
> 
> Then I check the lib, and I think the problem may cause by the following execute  order:
> 
>    thread A:                                              thread B:     
>    
>    fullcollect 
>       thread_suspendAll
>           suspend                                 
>                                                                thread_suspendHandler
>                                                                sem_post( &suspendCount );
> 
>                ret from sem_wait( &suspendCount );   
>       do collect
>       
>       thread_resumeAll
>                !! this signal would lost
>                pthread_kill( t.m_addr, SIGUSR2 )
>                                                               
>                                                                sigsuspend( &sigres );         
> 
> thread B would block because of the SIGUSR2 lost.

SIGUSR2 shouldn't be lost.  Tango sets sa_mask for the signal handlers 
to tell the OS to block all signals while the handler is processing. 
The call to sigsuspend is supposed to manually change that for the 
signals requested.

> then I check the phobos code, and the code is alike.
> 
> Now, I 'm trying to use semaphore to do resume, and would check if my programming run correctly.

Thanks, please do.  If it really is a problem I'd be happy to change it.


Sean



More information about the Digitalmars-d mailing list