Notes from C++ static analysis

Wed Jun 26 11:52:38 PDT 2013

On Wed, Jun 26, 2013 at 08:08:08PM +0200, bearophile wrote:
> An interesting blog post found through Reddit:
> 
> http://randomascii.wordpress.com/2013/06/24/two-years-and-thousands-of-bugs-of-/
[...]
> The most common problem they find are errors in the format string of
> printf-like functions (despite the code is C++):

None of my C++ code uses iostream. I still find stdio.h more comfortable
to use, in spite of its many problems. One of the most annoying features
of iostream is the abuse of operator<< and operator>> for I/O. Format
strings are an ingenious idea sorely lacking in the iostream department
(though admittedly the way it was implemented in stdio is rather unsafe,
due to the inability of C to do many compile-time checks).

> >The top type of bug that /analyze finds is format string errors –
> >mismatches between printf-style format strings and the corresponding
> >arguments. Sometimes there is a missing argument, sometimes there is
> >an extra argument, and sometimes the arguments don’t match, such as
> >printing a float, long or ‘long long’ with %d.<
> 
> Such errors in D are less bad, because writef("%d",x) is usable for
> all kind of integral values.

Less bad? Actually, IME format strings in D are amazingly useful! You
can pretty much use %s 99% of the time, because static type inference
works so well in D! The only time I actually write anything other than
%s is when I need to specify floating-point formatting options, like
%precision, or scientific format vs. decimal, etc..

Then throw in the array formatters %(...%), and D format strings will
totally blow C's stdio out of the water.

> On the other hand this D program prints
> just "10" with no errors, ignoring the second x:
> 
> import std.stdio;
> void main() {
>     size_t x = 10;
>     writefln("%d", x, x);
> }
> 
> In a modern statically typed language I'd like such code to give a
> compile-time error.

This looks like a bug to me. Please file one. :)

[...]
> There are some situations where this doesn't help, but they are not
> common in idiomatic D code:
> 
> void main() {
>     int i, j;
>     for (i = 0; i < 10; i++) {
>         for (i = 0; i < 20; i++) {
>         }
>     }
> }

I don't think this particular error is compiler-catchable. Sometimes,
you *want* the nested loop to reuse the same index (though probably not
in exactly the formulation as above, most likely the inner loop will
omit the i=0 part). The compiler can't find such errors unless it reads
the programmer's mind.

> In D this is one case similar to variable shadowing, that the
> compiler doesn't help you with:
> 
> class Foo {
>     int x, y, z, w;
>     this(in int x_, in int y_, in int z_, in int w_) {
>         this.x = x_;
>         this.y = y_;
>         this.z = z;
>         this.w = w_;
>     }
> }

Yeah, this one bit me before. Really hard. I had code that looked like
this:

	class C {
		int x;
		this(int x) {
			x = f(x);	// ouch
		}
		int f(int x) { ... }
	}

This failed horribly, so I rewrote the //ouch line to:

	this.x = x;

But that is still very risky, since in a member function that doesn't
shadow x, the above line is equivalent to this.x = this.x.

Anyway, in the end I decided that naming member function arguments after
member variables is a Very Stupid Idea, and that it should never be
done. It would be nice if the D compiler rejected such code.

[...]
> Logic bugs:
[...]
> enum INPUT_VALUE = 2;
> void f(uint flags) {
>     if (flags | INPUT_VALUE) {}
> }
> 
> 
> I have just added it to Bugzilla:
> http://d.puremagic.com/issues/show_bug.cgi?id=10480
[...]

Huh? Shouldn't that be (flags & ~INPUT_VALUE)?

How would the compiler catch such cases in general, though? I mean, like
in arbitrarily complex boolean expressions.

T

-- 
It said to install Windows 2000 or better, so I installed Linux instead.