[dmd-internals] Asserts

Sat Nov 10 11:23:53 PST 2012

On 11/9/2012 11:38 PM, Walter Bright wrote:
> [...]
> I'll often use printf because although the debugger knows types, it 
> rarely shows the values in a form I particularly need to track down a 
> problem, which tends to be different every time. And besides, throwing 
> in a few printfs is fast and easy, whereas setting break points and 
> stepping through a program is an awfully tedious process. Or maybe I 
> never learned to use debugger properly, which is possible since I've 
> been forced to debug systems where no debugger whatsoever was 
> available - I've even debugged programs using an oscilloscope, making 
> clicks on a speaker, blinking an LED, whatever is available.

You're making my point for me, Walter!  I have seen some people whiz 
through the debugger like they live in it, but I would say that level of 
familiarity tends to be the exception, rather than the rule.  And, it 
always makes me a little uncomfortable when I see it (why would someone 
*need* to be that proficient with the debugger...?).  Firing up the 
debugger, for many people, is a relatively expensive process, because it 
isn't something that good programmers should be doing very often (unless 
you subscribe to the school which says that you should always step 
through new code in the debugger...consider this an alternative to 
writing unit tests).

> Note that getting a call stack for a seg fault does not suffer from 
> these problems. I just:
>
>    gdb --args dmd foo.d
>
> and whammo, I got my stack trace, complete with files and line numbers.

There are two issues here.  1) Bugs which don't manifest as a segfault.  
2) Bugs in which a segfault is the manifestation, but the root cause is 
far away (i.e.: not even in the call stack).  I will say more on this below.

> [...]
>> Especially when there may be hundreds of instances running, while 
>> only a few actually experience a problem, logging usually turns out 
>> to be the better choice. Then consider that logging is also more 
>> useful for bug reporting, as well as visualizing the code flow even 
>> in non-error cases.
>
> Sure, but that doesn't apply to dmd. What's best practice for one kind 
> of program isn't for another.

There are many times when a command-line program offers logging of some 
sort which has helped me identify a problem (often a configuration error 
on my part).  Some obvious examples are command shell scripts (which, by 
default, simply tell you everything they are doing...both annoying and 
useful) and makefiles (large build systems with hundreds of makefiles 
almost always require a verbose mode to help debug a badly written 
makefile).

Also, note that when I am debugging a service, I am usually using it in 
a style which is equivalent to dmd.  That is, I get a repro case, I send 
it in to a standalone instance, I look the response and the logs.  This 
is really no different from invoking dmd on a repro case.  Even in this 
scenario, logs are incredibly useful because they tell me the 
approximate location where something went wrong.  Sometimes, this is 
enough to go look in the source and spot the error, and other times, I 
have to attach a debugger.  But even when I have to go to the debugger, 
the logs let me skip 90% of the single-stepping I might otherwise have 
to do (because they tell me where things *probably worked correctly*).

> [...]
> I've tried that (see the LOG macros in template.c). It doesn't work 
> very well, because the logging data tends to be far too voluminous. I 
> like to tailor it to each specific problem. It's faster for me, and 
> works.

The problem is not that a logging system doesn't work very well, but 
that a logging system without a configuration system is not first-class, 
and *that* is what doesn't work very well.  If you had something like 
log4j available, you would be able to tailor the output to something 
manageable.  An all-or-nothing log is definitely too much data when you 
turn it on.

On 11/9/2012 11:44 PM, Walter Bright wrote:
> [...]
> There is some async code in there. If I suspect a problem with it, 
> I've left in the single thread logic, and switch to that in order to 
> make it deterministic.

But that doesn't tell you what the problem is.  It just lets you escape 
to something functional by giving up on the parallelism. Logs at least 
tell you the running state in the parallel case, which is often enough 
to guess at what is wrong.  Trying to find a synchronization bug in 
parallel code is pretty darned difficult in a debugger (for what I hope 
are obvious reasons).

> [...]
> Actually, very very few bugs manifest themselves as seg faults. I 
> mentioned before that I regard the emphasis on NULL pointers to be 
> wildly excessive.

I would like to define a metric, which I call "bug depth".  Suppose that 
incorrect program behavior is noticed, and bad behavior is associated 
with some symbol, S.  Now, it could be that there is a problem with the 
immediate computation of S, whatever that might be (I mean, like in the 
same lexical scope).  Or, it could be that S is merely a victim of a bad 
computation somewhere else (i.e.: the computation of S received a bad 
input from some other computation). Let us call the bad input S'.  Now, 
it again may be the case that S' is a first-order bad actor, or that it 
is the victim of a bug earlier in the computation, say, from S''.  Let 
us call the root cause symbol R.  Now, there is some trail of 
dependencies from R to S which explain the manifestation of the bug.  
And let us call the number of references which must be followed from S 
to R the "bug depth".

Now that we have this metric, we can talk about "shallow" bugs and 
"deep" bugs.  When a segfault is caused by code immediately surrounding 
the bad symbol, we can say that the bug causing the segfault is 
"shallow".  And when it is caused by a problem, say, 5 function calls 
away, in non-trivial functions, it is probably fair to say that the bug 
is "deep".  In my experience, shallow bugs are usually simple mistakes.  
A programmer failed to check a boundary condition due to laziness, they 
used the wrong operator, they transposed some symbols, they re-used a 
variable they shouldn't have, etc.  And you know they are simple 
mistakes when you can show the offending code to any programmer 
(including ones who don't know the context), and they can spot the bug.  
These kinds of bugs are easy to identify and fix.

The real problem is when you look at the code where something is 
failing, and there is no obvious explanation for the failure.  Ok, maybe 
being able to see the state a few frames up the stack will expose the 
root cause.  When this happens, happy day!  It's not the shallowest bug, 
but the stack is the next easiest context in which to look for root 
causes.  The worst kinds of bugs happen when *everyone thinks they did 
the right thing*, and what really happened is that two coders disagreed 
on some program invariant.  This is the kind of bug which tends to take 
the longest to figure out, because most of the code and program state 
looks the way everyone expects it to look.  And when you finally 
discover the problem, it isn't a 1-line fix, because an entire module 
has been written with this bad assumption, or the code does something 
fairly complicated that can't be changed easily.

There are several ways to defend against these types of bugs, all of 
which have a cost.  There's the formal route, where you specify all 
valid inputs and outputs for each function (as documentation). There's 
the testing route, where you write unit tests for each function.  And 
there's the contract-based route, where you define invariants checked at 
runtime.  In fact, all 3 are valuable, but the return on investment for 
each one depends on the scale of the program.

Although I think good documentation is essential for a multi-coder 
project, I would probably do that last.  In fact, the technique which is 
the cheapest but most effective is to simply assert all your invariants 
inside your functions.  Yes, this includes things you think are silly, 
like checking for NULL pointers.  But it also includes things which are 
less silly, like checking for empty strings, empty containers, and other 
input assumptions which occur. It's essentially an argument for 
contract-based programming.  D has this feature in the language.  It is 
ironic that it is virtually absent from the compiler itself.  There are 
probably more assert(0) in the code than any other assert.

DMD has a fair number of open bugs left, and if I had to guess, the easy 
ones have already been cherry-picked.  That means the remainders are far 
more likely to be deep bugs rather than shallow ones.  And the only way 
I know how to attack deep bugs (both proactively and reactively) is to 
start making assumptions explicit (via assertions, exceptions, 
documentation), and give the people debugging a visualization of what is 
happening in the program via logs/debug output.  Often times, a log file 
will show patterns that give you a fuzzy, imprecise sense of what is 
happening that is still useful, because when a bug shows up, it disrupts 
the pattern in some obvious way.  This is what I mean by "visualizing 
the flow".  It's being able to step back from the bark-staring which is 
single-stepping, and trying to look at a stand of trees in the forest.

Dave