[Issue 15939] GC.collect causes deadlock in multi-threaded environment

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Wed Apr 27 16:31:05 PDT 2016


https://issues.dlang.org/show_bug.cgi?id=15939

safety0ff.bugz <safety0ff.bugz at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |safety0ff.bugz at gmail.com

--- Comment #9 from safety0ff.bugz <safety0ff.bugz at gmail.com> ---
Could you run strace to get a log of the signal usage?

For example:

strace -f -e signal -o signals.log command_to_run_program

Then add the output signals.log to the bug report?
I don't know if it'll be useful but it will be something more to look for
hints.

I'm wondering if there are any other signal handler invocations in the
"...application stack" part of your stack traces.
I've seem a deadlock caused by an assert firing within the
thread_suspendHandler, which deadlocks on the GC lock.

(In reply to Aleksei Preobrazhenskii from comment #6)
> Like, if thread_suspendAll happens while some threads are still in the 
> thread_suspendHandler (already handled resume signal, but still didn't leave 
> the suspend handler).

What should happen in this case is since the signal is masked upon signal
handler invocation, the new suspend signal is marked as "pending" and run once
thread_suspendHandler returns and the signal is unblocked.

The suspended thread cannot receive another resume or suspend signal until
after the sem_post in thread_suspendHandler.

I've mocked up the suspend / resume code and it does not deadlock from the
situation you've described.

> Real-time POSIX signals (SIGRTMIN .. SIGRTMAX) have stronger delivery
> guarantees

Their queuing and ordering guarantees should be irrelevant due to 
synchronization and signal masks.

I don't see any other benefits of RT signals.

(In reply to Walter Bright from comment #8)
> 
> Since you've written the code to fix it, please write a Pull Request for it.
> That way you get the credit!

He modified his code to use the thread_setGCSignals function:
https://dlang.org/phobos/core_thread.html#.thread_setGCSignals


P.S.: I don't mean to sound doubtful, I just want a sound explanation of the
deadlock so it can be properly address at the cause.

--


More information about the Digitalmars-d-bugs mailing list