What does Coverity/clang static analysis actually do?
Walter Bright
newshound1 at digitalmars.com
Thu Oct 1 11:21:05 PDT 2009
I've been interested in having the D compiler take advantage of the flow
analysis in the optimizer to do some more checking. Coverity and clang
get a lot of positive press about doing this, but any details of exactly
*what* they do have been either carefully hidden (in Coverity's case) or
undocumented (clang's page on this is blank). All I can find is
marketing hype and a lot of vague handwaving.
Here is what I've been able to glean from much time spent with google on
what they detect and my knowledge of how data flow analysis works:
1. dereference of NULL pointers (all reaching definitions of a pointer
are NULL)
2. possible dereference of NULL pointers (some reaching definitions of a
pointer are NULL)
3. use of uninitialized variables (no reaching definition)
4. dead assignments (assignment of a value to a variable that is never
subsequently used)
5. dead code (code that can never be executed)
6. array overflows
7. proper pairing of allocate/deallocate function calls
8. improper use of signed integers (who knows what this actually is)
Frankly, this is not an impressive list. These issues are discoverable
using standard data flow analysis, and in fact are part of Digital Mars'
optimizer. Here is the current state of it for dmd:
1. Optimizer discovers it, but ignores the information. Due to the
recent thread on it, I added a report for it for D (still ignored for
C). The downside is I can no longer use *cast(char*)0=0 to drop me into
the debugger, but I can live with that as assert(0) will do the same thing.
2. Optimizer collects the info, but ignores this, because people are
annoyed by false positives.
3. Optimizer detects and reports it. Irrelevant for D, though, because
variables are always initialized. The =void case is rare enough to be
irrelevant.
4. Dead assignments are automatically detected and removed. I'm not
convinced this should be reported, as it can legitimately happen when
generating source code. Generating false positives annoy the heck out of
users.
5. Dead code is detected and silently removed by optimizer. dmd front
end will complain about dead code.
6. Arrays are solidly covered by a runtime check. There is code in the
optimizer to detect many cases of overflows at compile time, but the
code is currently disabled because the runtime check covers 100% of the
cases.
7. Not done because it requires the user to specify what the paired
functions are. Given this info, it is rather simple to graft onto
existing data flow analysis.
8. D2 has acquired some decent checking for this.
There's a lot of hoopla about these static checkers, but I'm not
impressed by them based on what I can find out about them. What do you
know about what these checkers do that is not on this list? Any other
kinds of checking that would be great to implement?
D's dead code checking has been an encouraging success, and I think
people will like the null dereference checks. More along these lines
will be interesting.
More information about the Digitalmars-d
mailing list