Null references redux

Sean Kelly sean at invisibleduck.org
Tue Sep 29 14:22:14 PDT 2009


== Quote from Jeremie Pelletier (jeremiep at gmail.com)'s article
> Sean Kelly wrote:
> > == Quote from Sean Kelly (sean at invisibleduck.org)'s article
> >> One thing I'm not entirely sure about is whether the signal handler will always
> >> have a valid, C-style call stack tracing back into user code.  These errors are
> >> triggered by hardware, and I really don't know what kind of tricks are common
> >> at that level of OS code.  longjmp() doesn't have this problem because it doesn't
> >> care about the call stack--it just swaps some registers and executes a JMP.  I
> >> don't suppose anyone here knows more about the feasibility of throwing
> >> exceptions from signal handlers at all?  I'll ask around some OS groups and
> >> see what people say.
> >
> > I was right, it is illegal to throw an exception from a signal handler.  And worse,
> > it's illegal to call malloc from a signal handler, so you can't safely create an
> > exception object anyway.  Heck, I'm not sure it's even safe to perform IO from
> > a signal handler, so tracing directly from within the handler won't even work
> > reliably.  In short, while I'm totally fine with people using this in their own
> > code, it's too unreliable to make an "official" solution by adding it to Druntime.
> Weird, it works just fine for me. Maybe its because the exception is
> always caught in the thread's entry point, i never tried to let such an
> exception unwind past the entry point. I haven't tried malloc or any I/O
> either.

I think in practice, the issue is simply that malloc and IO routines aren't on
the list of reentrant functions, so if a signal is called from within one of these
routines then the signal handler trying to call the same routine could cause
Bad Things to happen.  This actually comes up in our GC code on Linux
because threads are suspended for the collection via signals.  If one of
these threads is suspended within a non-reentrant library routine and the
GC code calls the same routine it can crash or deadlock on an internal
mutex (the latter actually happened on OSX until I changed how GC works
there).  This is kind of a weird issue, since in this case any thread can screw
with the GC thread, even though the GC thread itself never enters a signal
handler.  This is something that never occurred to me before--it was Fawzi
that figured out why OSX apps were deadlocking for no reason whatsoever
(I *think* this was pre-Druntime, though I can't recall precisely).

In short, you may never actually run into a problem using these functions,
and if they work for you then that's all that matters.  I'm just hesitant to
roll something into Druntime that is "undefined" according to a spec and
has only been verified to work through experimentation by a subset of
D users.  ie. I'd rather Druntime be a tad gimped and always work than
be super fancy and not work for some people.  YMMV.

> There still should be a way to grab the backtrace and context data from
> the hidden ucontext_* parameter and do something with it after returning
> from the signal handler.

Yeah, I saw one suggestion that you could have a thread blocked waiting
for (in this case) backtrace data.  So another thread could do the trace
and no worries about signal handler limitations.  Still, this seems like a
pretty heavyweight approach.

If there were some way to cache the trace data and then have the same
thread process it I'd love to know how.  I ran into this "can't throw
exceptions from a signal handler" issue at a previous job, and finally
gave up on the idea in frustration after not being able to come up with
a decent workaround.

> The whole idea of a crash handler is to limit the number of times you
> need to do postmortem debugging after a crash, or launch the process
> again within the debugger.

Yup.  And as a server programmer, I think getting backtraces within a log
file is totally awesome, since dealing with a core dump is difficult at best
for such apps.  In fact I'd probably use your approach within my own code,
since it seems to work.



More information about the Digitalmars-d mailing list