DConf 2013 keynote

H. S. Teoh hsteoh at quickfur.ath.cx
Fri May 10 16:27:43 PDT 2013


On Fri, May 10, 2013 at 03:02:12PM -0700, Walter Bright wrote:
> On 5/10/2013 2:31 PM, H. S. Teoh wrote:
> >Note how much boilerplate is necessary to make the code work
> >*correctly*.
> 
> It's worse than that. Experience shows that this rat's nest style of
> code often is incorrect because it is both complex and never tested.

Yeah, some of those if's are very difficult to trigger, and usually
nested so deep in the call tree that most people just don't bother
trying to trigger it. Besides, the lack of built-in unittests in C means
that even if somebody *did* test it at one point, it's very unlikely
that the 15 people who came along later and modified the code will
repeat the same test. And even if they did, it was probably not a
*thorough* test...

Once I was trying to track down a baffling bug that causes a daemon to
suddenly stop responding for no discernible reason. We spent many hours
trying to figure out what went wrong, but didn't get very far.  The
first clue we found was that kill -11 didn't do anything. Now, we have a
segfault handler that writes the stacktrace to a log when the daemon
segfaults, you see, and when debugging we often deliberately use kill
-11 to segfault the daemon then look at the log to find out what it was
doing at the time of the signal.  This usually worked, but not this
time.  The signal seemed to be completely ignored. Only kill -9 is
capable of making the stuck process go away. At first we thought it was
a stray call to signal() or sigaction() that removed the stack trace
handler, but closer inspection suggested that this was not the case.

It turns out that this mysterious "stuck" state was caused by the stack
trace code -- but not in any of the usual ways. In order to produce the
trace, it uses fprintf to write info to the log, and fprintf in turn
calls malloc at various points to allocate the necessary buffers to do
that. Now, if for some reason free() segfaults (e.g., you pass in an
illegal pointer), then libc is still holding the internal malloc mutex
lock when the OS sends the SEGV to the process, so when the stack trace
handler then calls fprintf, which in turn calls malloc, it deadlocks.
Further SIGSEGV's won't help, since it only makes the deadlock worse.

All of this came about because we had overlooked the POSIX spec that
certain functions are unsafe to call inside signal-handler context. But
then again... who hasn't?! (Hands up, those of you who knew that fprintf
has undefined behaviour inside a signal handler. Yeah, I thought so.)
Eventually we had to rewrite the stack trace handler to only use write()
to a pre-opened socket to a logging daemon, since otherwise it was
impossible to actually write the stack trace anywhere without risking
undefined behaviour.

And none of this has even begun to address the original bug of why
free() was passed an illegal pointer in the first place. Isn't it fun
when most of the time you spend debugging is actually to fix the
error-handler rather than the actual bug?


> While D doesn't make it more testable, at least it makes it simple,
> and hence more likely to be correct.

It makes a big difference when the language itself supports certain
constructs like exceptions or scope guards. Scope guards cut away almost
all of the boilerplate cruft in the equivalent C if-and-goto construct,
making the attached statement so simple that it's most likely correct,
as you said. It also eliminates the need to sprinkle various parts of
that code across 2 or 3 different places in an overly-long function with
unclear execution path, that in C is almost guaranteed to become buggy
after passing through the grubby hands of the next 5 unfortunate coders
assigned to work on the code.

And while the scope guard itself may be buggy (DMD bug, say), it does
get tested very often -- every D program that uses it constitutes a test
case -- so any such bugs are quickly noticed and weeded out.

Seriously, D has so spoiled me I can't stand programming in another
language these days. :-P


T

-- 
EMACS = Extremely Massive And Cumbersome System


More information about the Digitalmars-d mailing list