Redundancies often reveal bugs

Sat Oct 2 09:16:30 PDT 2010

Thu, 30 Sep 2010 21:12:53 -0400, bearophile wrote:

> Here (pdf alert) I have found a very simple but interesting paper that
> has confirmed an hypothesis of mine.
> 
> This is a page that contains a pdf that shows a short introduction to
> the paper: http://www.ganssle.com/tem/tem80.htm
> 
> This is the paper, "Using Redundancies to Find Errors", by Yichen Xie
> and Dawson Engler, 2002: www.stanford.edu/~engler/p401-xie.pdf
> 
> 
> A trimmed down quote from the tem80 page:
> 
>>Researchers at Stanford have just released a paper detailing their use
>>of automated tools
> to look for redundant code in 1.6 million lines of Linux. "Redundant" is
> defined as: - Idempotent operations (like assigning a variable to
> itself) - Values assigned to variables that are not subsequently used -
> Dead code
> - Redundant conditionals
> 
> They found that redundancies, even when harmless, strongly correlate
> with bugs. Even when the extra code causes no problems, odds are high
> that other, real, errors will be found within a few lines of the
> redundant operations.
> 
> Block-copied code is often suspect, as the developer neglects to change
> things needed for the code’s new use. Another common problem area: 
error
> handlers, which are tough to test, and are, in data I’ve gathered, a
> huge source of problems in deployed systems. The authors note that their
> use of lint has long produced warnings about unused variables and return
> codes, which they've always treated as harmless stylistic issues. Now
> it's clear that lint is indeed signalling something that may be
> critically important. The study makes me wonder if compilers that
> optimize out dead code to reduce memory needs aren't in fact doing us a
> disservice. Perhaps they should error and exit instead.

If you've ever compiled open source code, you probably have noticed that 
some developers take software quality seriously. Their programs show no 
warnings/errors on compile time. That's not very impressive, when the 
code is below 5000 LOC, but if you apply the same principle when the 
codebase grows to 500000 LOC, it's a big win.

OTOH, there are lots of projects with lazy bastards developing them. 
Something ALWAYS breaks. A minor update from gcc ?.?.0 to ?.?.1 seems to 
be enough to break something. The developers were too lazy to study even 
the basic functionality of C and seem rather surprised when the compiler 
prevents data corruption or segfaults or other indeterministic states. I 
always treat code with lots of these bugs as something completely rotten. 
In distros like Gentoo these bugs prevent people from actually installing 
and using the program.

> class Foo {
>     int x, y;
>     this(int x_, int y_) {
>         this.x = x;
>         y = y;
>         
>     }
> }
> void main() {}

Some languages prevent this bug by making the parameters immutable in 
some sense (at least shallow immutability). It's even possible in Java, 
and in one place I worked previously "final params by default" was one of 
the rules in code review and style guides.