Lints, Condate and bugs

Walter Bright newshound2 at digitalmars.com
Wed Oct 27 17:14:12 PDT 2010


bearophile wrote:
> Commercial lints for C are probably able to find other kind of bugs too. Even
> Splint (a free lint for C) is probably able to do more than your list (but
> you need to add some semantics annotations to the C code if you want Split to
> do that).

When you find yourself adding semantic annotations to the C code, or providing 
extra semantic rules to the static analyzer, what that really is saying is that 
the abstract type abilities of the language being analyzed are deficient.


> The static analyzer of Clang is supposed to have a
> really low amount of false positives, so low that I think it may be
> configured to submit bug reports automatically :-)

We'll see. A lot of organizations treat false positives as "bugs" simply because 
it's easier to deal with them that way.


> I am quite sure that if I run a good C/C++ lint on the D front-end it may
> catch a large (hundreds) of bugs.

I don't believe it, but feel free to prove me wrong. I'll be happy to fix any 
bugs found that way, but not false positives.


> But even if you are right, that you don't
> write bugs that a simple rule-based analyzers is able to catch, the world is
> full of people that don't have your level of experience in C/C++ coding. So
> for them a lint may be useful.

I found lint useful for maybe a year or so, and then it just stopped finding any 
problems in my code. Not that there weren't bugs, not at all, but I had simply 
learned to not do the kinds of things lint detects.


>> 1. Memory allocation errors - failure to free, dangling pointers, redundant
>> frees 1. Garbage collection.
> 
> The GC avoids a large number of bugs. For the situations where the GC is
> unfit Ada shows the usage of more than one class of pointers, with different
> capabilities. This may reduce the bug rate (but introduces some extra
> complexity). In past for example I have suggested to statically differentiate
> pointers to GC-managed memory from pointers to manually managed memory (so
> they are two different kind of pointers), because they are quite different
> (example: putting tags inside a GC-managed pointer is a bad idea). You
> answered me that this introduces too much complexity in the language.

Microsoft's Managed C++ does exactly this. While a technical success, it is a 
complete failure in regards to pleasing programmers.

I'm also very familiar with using multiple pointer types, which are a necessity 
for DOS programming. I'm sick of it. It sucks. I don't want to go back to it, 
and I don't know anyone who does.


>> 4. Memory corruption, such as buffer overflows 4. Array bounds checking,
>> and safe mode in general, solves this.
> Array bounds checking slows down code a lot. Often more than 20%. See below
> for my answer to point 8.

That's why it's selectable with a switch.

> (Static analysis in recent JavaVMs is able to infer that many of those tests
> checks are useless and removed them with no harm for the code).

I know about data flow analysis, and I was able to implement such checking for 
array bounds. But it is of only limited effectiveness.


>> 6. Failure to deal with error returns 6. Exceptions solve this
> Yet I don't see exceptions used much in Phobos yet :-] Example: in Python
> list.index() throws an exception if the item is missing, while indexOf
> returns -1.

That is because there is a different idea of what an "error" is when indexing a 
list.




>> 8. Signed/unsigned mismatching 8. The only successful solution to this I've
>> seen is Java's simply not having
> unsigned types. Analysis tools just produce false positives.
> 
> When I write C code I always keep this GCC warning active, and I find it
> useful.

The rate of false positives for such make them not suitable for inclusion in the 
language.


> In your list you are forgetting integral overflows too, that for example
> static analysis in SPARK is often able to avoid (but you need lot of time and
> brain to write such code, so it's not a general solution for D or other
> languages, it's fit only for special code).

I did that deliberately, as I haven't seen any focus on them in static analysis 
tools. But I do know that it is of particular interest to you.


>> I also suspect that it would result in a drastic performance problem
>> (remember, Python is 100x slower than native code),
> Python is 100-130x slower than native code only if your native code is
> numerically-intensive and really well written for performance, otherwise in
> most cases you don't end up more than 20-50x slower.

Even 20% slower, let alone 20 times slower, is unacceptable.


> CLisp and OCaML implementations use tagged integers, and from what I have
> have seen you can't expect CLisp code that uses lot of integers more than 2-3
> times slower than D code.

Such slowdowns are simply not acceptable for D.


> I have used Delphi and C# with integer overflows and if you use them in C#
> code you usually don't see the code more than 50% slower even if the code is
> using integer numbers all the time :-) Usually the percentage is lower. In
> practice this is not a problem, especially if you are debugging your program
> :-)

50% slower than C++ means that people will not switch from C++ to D. I think a 
total 5% slowdown relative to C++ is about the max acceptable.

In particular, I do not view integer overflows as remotely big enough of a 
problem to justify such massive slowdowns. Yes, I've had an overflow bug here 
and there over the years, but nothing remotely as debilitating as uninitialized 
data bugs or pointer bugs.


More information about the Digitalmars-d mailing list