Lints, Condate and bugs

Wed Oct 27 17:14:12 PDT 2010

bearophile wrote:
> Commercial lints for C are probably able to find other kind of bugs too. Even
> Splint (a free lint for C) is probably able to do more than your list (but
> you need to add some semantics annotations to the C code if you want Split to
> do that).

When you find yourself adding semantic annotations to the C code, or providing 
extra semantic rules to the static analyzer, what that really is saying is that 
the abstract type abilities of the language being analyzed are deficient.

> The static analyzer of Clang is supposed to have a
> really low amount of false positives, so low that I think it may be
> configured to submit bug reports automatically :-)

We'll see. A lot of organizations treat false positives as "bugs" simply because 
it's easier to deal with them that way.

> I am quite sure that if I run a good C/C++ lint on the D front-end it may
> catch a large (hundreds) of bugs.

I don't believe it, but feel free to prove me wrong. I'll be happy to fix any 
bugs found that way, but not false positives.

> But even if you are right, that you don't
> write bugs that a simple rule-based analyzers is able to catch, the world is
> full of people that don't have your level of experience in C/C++ coding. So
> for them a lint may be useful.

I found lint useful for maybe a year or so, and then it just stopped finding any 
problems in my code. Not that there weren't bugs, not at all, but I had simply 
learned to not do the kinds of things lint detects.

>> 1. Memory allocation errors - failure to free, dangling pointers, redundant
>> frees 1. Garbage collection.
> 
> The GC avoids a large number of bugs. For the situations where the GC is
> unfit Ada shows the usage of more than one class of pointers, with different
> capabilities. This may reduce the bug rate (but introduces some extra
> complexity). In past for example I have suggested to statically differentiate
> pointers to GC-managed memory from pointers to manually managed memory (so
> they are two different kind of pointers), because they are quite different
> (example: putting tags inside a GC-managed pointer is a bad idea). You
> answered me that this introduces too much complexity in the language.

Microsoft's Managed C++ does exactly this. While a technical success, it is a 
complete failure in regards to pleasing programmers.

I'm also very familiar with using multiple pointer types, which are a necessity 
for DOS programming. I'm sick of it. It sucks. I don't want to go back to it, 
and I don't know anyone who does.

>> 4. Memory corruption, such as buffer overflows 4. Array bounds checking,
>> and safe mode in general, solves this.
> Array bounds checking slows down code a lot. Often more than 20%. See below
> for my answer to point 8.

That's why it's selectable with a switch.

> (Static analysis in recent JavaVMs is able to infer that many of those tests
> checks are useless and removed them with no harm for the code).

I know about data flow analysis, and I was able to implement such checking for 
array bounds. But it is of only limited effectiveness.

>> 6. Failure to deal with error returns 6. Exceptions solve this
> Yet I don't see exceptions used much in Phobos yet :-] Example: in Python
> list.index() throws an exception if the item is missing, while indexOf
> returns -1.

That is because there is a different idea of what an "error" is when indexing a 
list.

>> 8. Signed/unsigned mismatching 8. The only successful solution to this I've
>> seen is Java's simply not having
> unsigned types. Analysis tools just produce false positives.
> 
> When I write C code I always keep this GCC warning active, and I find it
> useful.

The rate of false positives for such make them not suitable for inclusion in the 
language.

> In your list you are forgetting integral overflows too, that for example
> static analysis in SPARK is often able to avoid (but you need lot of time and
> brain to write such code, so it's not a general solution for D or other
> languages, it's fit only for special code).

I did that deliberately, as I haven't seen any focus on them in static analysis 
tools. But I do know that it is of particular interest to you.

>> I also suspect that it would result in a drastic performance problem
>> (remember, Python is 100x slower than native code),
> Python is 100-130x slower than native code only if your native code is
> numerically-intensive and really well written for performance, otherwise in
> most cases you don't end up more than 20-50x slower.

Even 20% slower, let alone 20 times slower, is unacceptable.

> CLisp and OCaML implementations use tagged integers, and from what I have
> have seen you can't expect CLisp code that uses lot of integers more than 2-3
> times slower than D code.

Such slowdowns are simply not acceptable for D.

> I have used Delphi and C# with integer overflows and if you use them in C#
> code you usually don't see the code more than 50% slower even if the code is
> using integer numbers all the time :-) Usually the percentage is lower. In
> practice this is not a problem, especially if you are debugging your program
> :-)

50% slower than C++ means that people will not switch from C++ to D. I think a 
total 5% slowdown relative to C++ is about the max acceptable.

In particular, I do not view integer overflows as remotely big enough of a 
problem to justify such massive slowdowns. Yes, I've had an overflow bug here 
and there over the years, but nothing remotely as debilitating as uninitialized 
data bugs or pointer bugs.