What does Coverity/clang static analysis actually do?

Thu Oct 1 16:46:15 PDT 2009

On Thu, 1 Oct 2009, Walter Bright wrote:

> I've been interested in having the D compiler take advantage of the flow
> analysis in the optimizer to do some more checking. Coverity and clang get a
> lot of positive press about doing this, but any details of exactly *what* they
> do have been either carefully hidden (in Coverity's case) or undocumented
> (clang's page on this is blank). All I can find is marketing hype and a lot of
> vague handwaving.
> 
> Here is what I've been able to glean from much time spent with google on what
> they detect and my knowledge of how data flow analysis works:
> 

Snipped a lot of the detail, because that's not really what makes the 
tools interesting.  There's a couple things that do, im my opinion -- with 
a little experience having used Fortify and looked at Coverity a couple 
times over the years (and would be using if it wasn't so much more 
expensive than Fortify).

1) Rich flow control.  They go well beyond what's typically done by 
compiliers during their optimization passes.  They tend to be whole-code 
in scope and actually DO the parts that are hard, like cross expression 
variable value tracking similar to a couple examples in this thread.  
Function boundaries are no obstacle to them.  The only obstacle is where 
source isn't provided.

2) Due to working with whole source bases, the UI for managing the data 
produced is critical to overall usability.  A lot of time goes into making 
it easy to manage the output.. both for single runs and for cross-run flow 
of data.  Some examples:

   * suppression of false positives, 
   * graphing of issue trends
   * categorization of issue types

3) Rule creation.  The core engine usually generates some digested dataset 
upon rules are evaluated.  The systems come with a builtin set that do the 
sorts of things already talked about.  In addition they come with the 
ability to develop new rules specific to your application and business 
needs.  For example:

   * tracking of taint from user data
   * what data is acceptable to log to files (for example NOT credit-cards)

4) They're expected to be slower than compilation, so it's ok to do things 
that are computationally prohibitive to do during compilation cycles.

----

I've seen these tools detect some amazing subtle bugs in c and c++ code.  
They're particularly handy in messy code.   They can help find memory 
leaks where the call graphs are arbitrarily obscure.  Sites where NULL 
pointers are passed into a function that dereferences without a null check 
even when the call graph has many layers.

Yes, rigid contract systems and richer type systems can help reduce the 
need for some of these sorts of checks, but as we all know, there's 
tradeoffs.

That help?

Later,
Brad