"Expressive vs. permissive languages" and bugs

Sat Oct 23 06:12:17 PDT 2010

I think I have not shown this article yet, "Expressive vs. permissive languages: Is that the question?" by Yannick Moy:

First page, with reader comments:

http://www.eetimes.com/design/eda-design/4008921/Expressive-vs-permissive-languages--Is-that-the-question-

Single page, without reader comments:

http://www.eetimes.com/General/DisplayPrintViewContent?contentItemId=4008921

I think this article doesn't say particularly new things (and I think it's a bit biased toward Ada), but it says them in a nice and compact way, it discusses about a topic that interests D designers, because D is designed to avoid some of the typical bugs of C code.

The section "A simple example in C/Java/Ada": A D version of that function may look just like the Java code. But probably it's better to add a precondition too, that tests if conf is not null and the bounds of num_proc too and raises exceptions otherwise.

That Ada version of the code takes something that can't be null and the number of items of the array can't be too much big. So the article says:

>this makes a total of five possible errors in C, three in Java, and two in Ada.<

Ranged integers (that are a special case of integer overflows) are a good idea (as probably a not-null attribute for pointers/references).

See also the comments about the different kinds of pointers, that have different capabilities, to avoid bugs.

>According to a study reported in 2003 by Andy German on military systems varying in size from 3,000 lines of code to 300,000 lines of code, these languages are also those in which programmers make less errors, four per thousand lines on average for SPARK, between 4.8 and 50 per thousand lines for Ada, between 12.5 and 500 (sic) per thousand lines for C.<

I presume the bug rate of well written D code may lie somewhere between the C and Ada one, because D is able to avoid some of the bugs of C programs, but it's not as strict as Ada (see for example bug  http://d.puremagic.com/issues/show_bug.cgi?id=3999 ).

-------------------

The comments after the article look even more interesting than the article :-)

>Plus it compiles despite a crucial bug: your parameter res should be a Proc** and you should be assigning the result of the allocation to *res. <

To try to make the C code safer, that commenter has added stuff to the C program, and has introduced another bug, uncaught by the compiler. Ada isn't a succinct language, but all those extra fluff you add to an Ada program is useful to actually increase consistency of the code. So it's not the same thing.

One of the answers is very nice and speaks for a strong typedef, stronger enum, ranged types:

>In your revised version of the C code, the types Result_t and uint8_t are compatible with each other in expressions, despite their different purposes. Indeed, they are compatible with every other enum and integer type and floats under most circumstances, under various confusing and inconsistent silent promotion rules. If you are lucky then you will sometimes get a warning, but you can't rely on it. Even the MISRA checker allows a Result_t to be assigned to a uint8_t, even though this almost certainly makes no sense. And in any any C-derived language (MISRA or not) you have to use one of a small fixed set of integer types that almost never have the appropriate range for the quantity in question. And as well as having inappropriate ranges, quantities that should never be assignment compatible or mixable in expressions (without explicit conversions) can be silently confused with each other.<

>This compiles, passes MISRA checking, and makes no sense. The if test is never true (it should say "ActiveState[n] == INACTIVE)"). There isn't a real type tState, just a bunch of constants. INACTIVE, being the first one, has the value 0. ActiveState, used in an expression, is merely a pointer. Pointers can be compared with 0. This is all fundamentally bad. <

The D compiler is able to catch that bug, yeah :-)

>Also, you cannot just dismiss returning a pointer to a local variable as a beginner's mistake. It can be done in less obvious ways, for one thing. But the main point is that it is obviously dangerous and should simply be forbidden. Ada has rules that are designed to prevent such a mistake from even compiling. They make it less permissive than C in this respect. This is a good thing.<

I agree. See also bug: http://d.puremagic.com/issues/show_bug.cgi?id=3925

> if (getuid() != 0 && getuid == 0) {
>   ErrorF("only root!");
> 
>   exit(1);
> 
> }

> for (int i=0; i != MAX_ELEMENTS; i++);
> {
> floatValues_l[i] = 0.0f;
> }

D compiler is able to catch both bugs, yeah! :-)

----------------

Now I'd like the D language to become a bit more strict, so the compiler may catch more bugs, integral-related bugs, enum bugs, some pointers bugs, and so on.

An example:

Here in C the order of evaluation of foo() and bar() is not specified. In D it's better to specify it, to define the semantics of D code and make a bit more safe porting D code across different compilers:

auto z = foo(x) + bar(y);

On the other hand if both foo() and bar() are strongly pure functions, then the D compiler must be free and able to act as in C, choosing the most efficient order of evaluation of foo() and bar().

This is better than both C# and C, you gain the speed of C without losing safety compared to C#.

----------------

Several sources I have read seem to show that programs written in Ada contain less bugs than programs written in about all other languages (but Ada subsets like SPARK, etc). And it's generally known that often the amount of time needed to debug programs is a significant percentage of the whole programming time.

Then why isn't Ada used more?
- Maybe programmers that have learnt as their first language a C-like language find bad the Pascal-like syntax of Ada.
- Maybe because Ada programs are a little "logorrhoeic", you need to write lot of code.
- Maybe because Ada isn's diffused, and professional programmers don't want to use years of their life to study and use a language that offers low hopes of being hired elsewhere.
- Maybe because Ada is a pernickety language, every detail needs to be correct if you want to see your program compiled.

But in my opinion that list misses an important point that I have not seen in those articles: to me Ada doesn't look very good for explorative programming. This means I think it's not well fit to both invent new coding ideas, or just to invent a working solution to programming problem. When I have an algorithmic problem, I want to use most of my mind to think about the problem, and not to care and cuddle the compiler, otherwise it's less likely that I am actually able to find a solution.

So it may be positive a less fussy language, as Python, for the first phase of exploration (think about using MatPlotLib from the Python shell to plot data and invent ideas) and invention of a solution algorithm. And later a good language has to offer ways to make the code less buggy and more rigorous, like Ada code (for example using an attribute, that switches the code from dynamic typing to static typing, etc).

I do this a bit when I program in D2: first I write the D2 code without const/pure/immutable, then when the code works I add those attributes.

------------------

To avoid bugs in C code I have found a tool that I didn't know about, "mygcc", a variant of it may be written to find bugs in D code too:

http://mygcc.free.fr/overview.html

It's a kind of metalanguage, it allows to define rules, using a ugly but compact syntax, that are then applied on C code to catch bugs.

>From the tests it seems to work well enough, and the rules are compact. But their syntax doesn't look very good yet.

Bye,
bearophile