Null references redux

Tue Sep 29 14:38:33 PDT 2009

Sean Kelly wrote:
> == Quote from Jeremie Pelletier (jeremiep at gmail.com)'s article
>> Sean Kelly wrote:
>>> == Quote from Sean Kelly (sean at invisibleduck.org)'s article
>>>> One thing I'm not entirely sure about is whether the signal handler will always
>>>> have a valid, C-style call stack tracing back into user code.  These errors are
>>>> triggered by hardware, and I really don't know what kind of tricks are common
>>>> at that level of OS code.  longjmp() doesn't have this problem because it doesn't
>>>> care about the call stack--it just swaps some registers and executes a JMP.  I
>>>> don't suppose anyone here knows more about the feasibility of throwing
>>>> exceptions from signal handlers at all?  I'll ask around some OS groups and
>>>> see what people say.
>>> I was right, it is illegal to throw an exception from a signal handler.  And worse,
>>> it's illegal to call malloc from a signal handler, so you can't safely create an
>>> exception object anyway.  Heck, I'm not sure it's even safe to perform IO from
>>> a signal handler, so tracing directly from within the handler won't even work
>>> reliably.  In short, while I'm totally fine with people using this in their own
>>> code, it's too unreliable to make an "official" solution by adding it to Druntime.
>> Weird, it works just fine for me. Maybe its because the exception is
>> always caught in the thread's entry point, i never tried to let such an
>> exception unwind past the entry point. I haven't tried malloc or any I/O
>> either.
> 
> I think in practice, the issue is simply that malloc and IO routines aren't on
> the list of reentrant functions, so if a signal is called from within one of these
> routines then the signal handler trying to call the same routine could cause
> Bad Things to happen.  This actually comes up in our GC code on Linux
> because threads are suspended for the collection via signals.  If one of
> these threads is suspended within a non-reentrant library routine and the
> GC code calls the same routine it can crash or deadlock on an internal
> mutex (the latter actually happened on OSX until I changed how GC works
> there).  This is kind of a weird issue, since in this case any thread can screw
> with the GC thread, even though the GC thread itself never enters a signal
> handler.  This is something that never occurred to me before--it was Fawzi
> that figured out why OSX apps were deadlocking for no reason whatsoever
> (I *think* this was pre-Druntime, though I can't recall precisely).
> 
> In short, you may never actually run into a problem using these functions,
> and if they work for you then that's all that matters.  I'm just hesitant to
> roll something into Druntime that is "undefined" according to a spec and
> has only been verified to work through experimentation by a subset of
> D users.  ie. I'd rather Druntime be a tad gimped and always work than
> be super fancy and not work for some people.  YMMV.

I agree, I don't mind occasional crashes within the crash handler itself 
if it ever comes to that, at this point things are already going pretty 
bad anyways and the process is already going to exit soon enough. It 
could be confusing as hell to library users if they don't know this 
might happen in rare cases, so I understand keeping it away from 
Druntime until a proven solution is found.

>> There still should be a way to grab the backtrace and context data from
>> the hidden ucontext_* parameter and do something with it after returning
>> from the signal handler.
> 
> Yeah, I saw one suggestion that you could have a thread blocked waiting
> for (in this case) backtrace data.  So another thread could do the trace
> and no worries about signal handler limitations.  Still, this seems like a
> pretty heavyweight approach.

Eh, I'm not going that way either :) Maybe spawn another process with 
some basic infos collected by the signal handler (ie registers, loaded 
modules and backtrace) and let that other process deal with generating a 
crash window while we gracefully shut down with a core dump. That's also 
a heavyweight idea but its only happening after a crash, not while 
waiting for it.

> If there were some way to cache the trace data and then have the same
> thread process it I'd love to know how.  I ran into this "can't throw
> exceptions from a signal handler" issue at a previous job, and finally
> gave up on the idea in frustration after not being able to come up with
> a decent workaround.
> 
>> The whole idea of a crash handler is to limit the number of times you
>> need to do postmortem debugging after a crash, or launch the process
>> again within the debugger.
> 
> Yup.  And as a server programmer, I think getting backtraces within a log
> file is totally awesome, since dealing with a core dump is difficult at best
> for such apps.  In fact I'd probably use your approach within my own code,
> since it seems to work.

Yeah I'm not much into post-mortem debugging either, I like running 
within the debugger or having a convenient crash window. It's also neat 
thing to use when you distribute your executable since you can implement 
a smtp mailer for the crash reports instead of the crash window.