Improving Compiler Error Messages

Sun May 2 02:51:48 PDT 2010

>The next dmd update is getting the fruits of this.<

Thank you for your interesting article.

>Lately, the clang compiler project has revived interest in improving compiler error diagnostic messages:<

"Competition" is useful. I have seen few other things in GCC that are changing/moving thanks/because to the Clang competition. The recently introduced plug-ins of GCC, and the -flto compilation switch of GCC too seem an effect of Clang presence.

>Back in the 1980's, I thought I'd do this one better and have the compiler also emit the offending source code line with a ^ under where it went wrong, like:< [...] >Secondly, it ate up 3 lines of valuable vertical real estate rather than one, meaning that other useful messages got scrolled off the screen.<
>The Digital Mars C and C++ compilers still do this, but I dropped it for the D programming language compiler. Nobody remarked upon the absence of this long standing feature, so I figured it was the right decision to abandon it.<

In 2010 people are not interested in the scrolling of text in the command line interface, even basic editors gulp down the error messages.

In GCC 4.5 they have recently added the columns:
>The -fshow-column option is now on by default. This means error messages now have a column associated with them.<

So this is an example of error message generated on C code by GCC 4.5:
test.c:28:6: error: 'struct <anonymous>' has no member named 'seed'

The C# compiler, Clang, and GCC show the column of the error.

>The more enterprising of these folks will build their own language to prove their point.<

You have to respect their efforts because young people have to err to learn and become wise and because sometimes, against all odds, they are able to find alternative solutions that can work well enough, even where wise old people have failed finding any. Human history is full of examples of this.

>The result is that it works great as long as there are no mistakes in the source code. As soon as there is one, the error diagnostics tend to be gibberish, the location of the error is often off by several lines, and one mistake results in a cascade of increasingly obscure messages.<

Error messages in Python are OK. But the whole syntax of Python is designed around the idea of no semicolons.

You have also to take into account the time wasted in D programming adding semicolons when the compiler shows you errors caused by their absence :-)

>And so the semicolon persists.<

Some of the languages developed in the last years seems to not agree with you, Scala and Go seem to avoid semicolons.

>Spell Checking This appears to have been first tried by the Clang group.<

Nope, Mathematica has had this for ages. (And seeing Mathematica I have suggested you this feature two years ago).

-------------------

Walter Bright:

>Reading the first one only generally is a reaction to compilers giving nonsense cascaded errors afterwards. If the multiple errors really are errors, then it's convenient to get them fixed with one pass through the edit/compile cycle.<

I agree. But it's hard to find a compiler so good, I think the C# and Java compilers are among the few ones that seem good enough that looking past the first error is worth it.
GCC and (so far) dmd give error messages that usually make looking past the first error a waste of time.

>Yes, it's done in JavaScript. It doesn't work very well.<

JavaScript, like most languages, was designed too much quickly (even D, that is several years old, in its final rush has probably introduced several warts. I hope some of them will be fixed), and it contains some traps and design errors.

In JavaScript there is semicolon insertion:
>It's implemented by the parser running along until it hits a syntax error, at which point it rewinds a little, inserts a semicolon in a likely place, and tries again.<

Here you can read more about this:
http://www.mozilla.org/js/language/js20-2000-07/rationale/syntax.html

This code:

return {
	JavaScript : "fantastic"
};

Is different from:

return
{
	JavaScript : "fantastic"
};

This is how JS reads that:

return; // Semicolon inserted, believing the statement has finished. Returns undefined
{ // Considered to be an anonymous block, doing nothing
	JavaScript : "fantastic"
};// Semicolon interpreted as an empty dummy line and moved down

So programming guides for JS suggest programmers to always use the semicolon...

But even if JS has failed its design its semicolon-related feature, there are other ways to design it, I have not read people complain much about the bugs caused by missing semicolons in Scala.

If you just remove the semicolons from D you probably cause problems, as in JavaScript. To make this idea work you have to adapt other parts of the syntax to this change. And you don't have to add a semicolon as JS does. You can interpret newlines as semicolons and introduce explicit line continuation symbols, as in Python. In Python and Delight there are also the colon used to denote the start of a block. This makes the syntax good enough again for both the compiler and the eyes of the programmer (the colon is redundant but it's there for the programmer eyes, and for some editors too).

In a program most lines of code end at the newlines, so to reduce typing and errors it's a good idea to swap the syntax: instead of adding an explicit syntax to end lines (semicolon) it's better to introduce a line continuation (like \ in Python).

If semicolons are so good for the compiler, then there can be ways to fix this small problem (I was planning in writing this bug report even before reading this last paper of yours):
http://d.puremagic.com/issues/show_bug.cgi?id=4144

Bye,
bearophile