What does Coverity/clang static analysis actually do?

Thu Oct 1 11:21:05 PDT 2009

I've been interested in having the D compiler take advantage of the flow 
analysis in the optimizer to do some more checking. Coverity and clang 
get a lot of positive press about doing this, but any details of exactly 
*what* they do have been either carefully hidden (in Coverity's case) or 
undocumented (clang's page on this is blank). All I can find is 
marketing hype and a lot of vague handwaving.

Here is what I've been able to glean from much time spent with google on 
what they detect and my knowledge of how data flow analysis works:

1. dereference of NULL pointers (all reaching definitions of a pointer 
are NULL)

2. possible dereference of NULL pointers (some reaching definitions of a 
pointer are NULL)

3. use of uninitialized variables (no reaching definition)

4. dead assignments (assignment of a value to a variable that is never 
subsequently used)

5. dead code (code that can never be executed)

6. array overflows

7. proper pairing of allocate/deallocate function calls

8. improper use of signed integers (who knows what this actually is)

Frankly, this is not an impressive list. These issues are discoverable 
using standard data flow analysis, and in fact are part of Digital Mars' 
optimizer. Here is the current state of it for dmd:

1. Optimizer discovers it, but ignores the information. Due to the 
recent thread on it, I added a report for it for D (still ignored for 
C). The downside is I can no longer use *cast(char*)0=0 to drop me into 
the debugger, but I can live with that as assert(0) will do the same thing.

2. Optimizer collects the info, but ignores this, because people are 
annoyed by false positives.

3. Optimizer detects and reports it. Irrelevant for D, though, because 
variables are always initialized. The =void case is rare enough to be 
irrelevant.

4. Dead assignments are automatically detected and removed. I'm not 
convinced this should be reported, as it can legitimately happen when 
generating source code. Generating false positives annoy the heck out of 
users.

5. Dead code is detected and silently removed by optimizer. dmd front 
end will complain about dead code.

6. Arrays are solidly covered by a runtime check. There is code in the 
optimizer to detect many cases of overflows at compile time, but the 
code is currently disabled because the runtime check covers 100% of the 
cases.

7. Not done because it requires the user to specify what the paired 
functions are. Given this info, it is rather simple to graft onto 
existing data flow analysis.

8. D2 has acquired some decent checking for this.

There's a lot of hoopla about these static checkers, but I'm not 
impressed by them based on what I can find out about them. What do you 
know about what these checkers do that is not on this list? Any other 
kinds of checking that would be great to implement?

D's dead code checking has been an encouraging success, and I think 
people will like the null dereference checks. More along these lines 
will be interesting.