Ideas from Clang

bearophile bearophileHUGS at lycos.com
Sun Feb 19 12:19:59 PST 2012


A belated comment on the GoingNative 2012 talk "Defending C++ fom Murphy's Million Monkeys" by Chandler Carruth:
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Clang-Defending-C-from-Murphy-s-Million-Monkeys

The slides:
http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf

This talk shows some ways used by Clang to help the programmer spot or fix some mistakes in C++ code (according to Chandler, all bugs shown in the talk are real bugs found in production software). I think some of those ideas are interesting or useful for D too.

-----------------------------

Slides page 31 - 32, 14.48 in the video:

I have adapted the Clang example to this D code:


class BaseType {}
static int basetype;
class DerivedType : Basetype {}
void main() {}


The latest DMD gives this error:
test.d(3): Error: undefined identifier Basetype, did you mean variable basetype?


The idea here is to restrict the list of names to search for similar ones only among the class names, because here "basetype" is an int so it can't be what the programmer meant. There are few other groups of names for other situations.

This also reduces the work of the routines that look for similar names, because they need to work on a shorter list of possibilities.

-----------------------------

Slides page 47 - 48, 21.24 in the video:

C++ code:

static const long long DiskCacheSize = 8 << 30; // 8 Gigs

Clang gives:

% clang++ -std=c++11 -fsyntax-only overflow.cpp
overflow.cpp:1:42: warning: signed shift result (0x200000000) requires 35 
bits to represent, but 'int' only has 32 bits [-Wshift-overflow]
static const long long DiskCacheSize = 8 << 30; // 8 Gigs
                                       ~ ^  ~~

In D/DMD this compiles with no errors or warnings:

static const long diskCacheSize = 8 << 30;
static assert(diskCacheSize == 0);
void main() {}


This kind of overflow errors are important and common enough, so I think D too has to spot them, as Clang.


I have done a little test, and I've seen that with the latest Clang this C++ code gives no warnings:

const static unsigned int x = -1;

-----------------------------

Slides 57 - 58, 28.11 in the video:

C++ code:

void test(bool b, double x, double y) {
  if (b || x < y && x > 0) {
    // ...
  }
}


Clang gives:


% clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses3.cpp
parentheses3.cpp:2:18: warning: '&&' within '||' [-Wlogical-op-
parentheses]
  if (b || x < y && x > 0) {
        ~~ ~~~~~~^~~~~~~~
parentheses3.cpp:2:18: note: place parentheses around the '&&' expression
to silence this warning
  if (b || x < y && x > 0) {
                 ^
           (             )
1 warning generated.


DMD compiles that code with no warnings. I think a similar warning is useful in D/DMD too.

-----------------------------

Slides 59 - 60, 29.17 in the video:

int test(bool b, int x, int y) {
  return 42 + b ? x : y;
}


Clang gives:


% clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses4.cpp
parentheses4.cpp:2:17: warning: operator '?:' has lower precedence than 
'+'; '+' will be evaluated first [-Wparentheses]
  return 42 + b ? x : y;
         ~~~~~~ ^
parentheses4.cpp:2:17: note: place parentheses around the '+' expression 
to silence this warning
  return 42 + b ? x : y;
                ^
         (     )
parentheses4.cpp:2:17: note: place parentheses around the '?:' expression
to evaluate it first
  return 42 + b ? x : y;
                ^
              (        )
1 warning generated.


DMD compiles that code with no warnings.
(Maybe here Don thinks that accepting int+bool in the language rules is bad in the first place.)

-----------------------------

A little comment on slides 63 - 64: in D when I switch on an enumeration, I always use a final switch. I think use cases for nonfinal switches on enums are rare in D code.

-----------------------------

Slides 65 - 66, 31.57 in the video:


constexpr int arr_size = 42;
constexpr int N = 44;
void f(int);
int test() {
  int arr[arr_size];
  // ...
  f(arr[N]);
  // ...
  if (N < arr_size) return arr[N];
  return 0;
}


Clang gives:

% clang++ -std=c++11 -fsyntax-only deadcode.cpp
deadcode.cpp:7:5: warning: array index 44 is past the end of the array
(which contains 42 elements) [-Warray-bounds]
  f(arr[N]);
    ^   ~
deadcode.cpp:5:3: note: array 'arr' declared here
  int arr[arr_size];
  ^
1 warning generated.


Similar D code:

enum int arr_size = 42;
enum int N = 44;
void foo(int) {}
int test() {
    int[arr_size] arr;
    foo(arr[N]);
    if (N < arr_size)
        return arr[N];
    return 0;
}
void main() {}


DMD is able to catch the array overflow bugs, but it shows three errors:

test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
test.d(8): Error: array index 44 is out of bounds arr[0 .. 42]


Clang doesn't give an error at the "return arr[N];" line because it's inside the "if(N<arr_size)" branch. This is nice (despite this code looks silly, it's real code reduced from a meaningful example, the enums come from conditional compilation, etc).

As a related example, in D I have written code similar to:


void foo(ulong x) {
    if (x < 5_000)
        int y = x;
}
void main() {}


Here DMD asks for a cast(int) or to!int:
test.d(3): Error: cannot implicitly convert expression (x) of type ulong to int

But a bit smarter compiler/language is able to tell there's no need for casts or conversions there.

-----------------------------

The lock annotations in slides 70 - 74 are nice.

-----------------------------

Regarding slides 76 - 79, is it useful to catch simple bugs like this statically?


void main() {
    auto a = new int[10];
    a[10] = 2;
}

-----------------------------

Slides 80 - 81, 41.46 in the video:

Clang is able to catch this bug at runtime in C++ code:


#include <iostream>
int test(int n) {
  return 42 << n;
}
int main() {
  std::cerr << "Bogus: " << test(42)
            << "\n";
}



% clang++ -g -std=c++0x undef.cpp -o undef -fcatch-undefined-behavior
% gdb -x =(echo -e "r\nbt") --args ./undef
...
Program received signal SIGILL, Illegal instruction.
0x00000000004007fe in test (n=42) at undef.cpp:3
3         return 42 << n;
#0  0x00000000004007fe in test (n=42) at undef.cpp:3
#1  0x000000000040084b in main () at undef.cpp:6


Here Chandler says that this runtime instrumentation is light and fast. This is good compiler sugar for future D compilers (maybe for LDC2).

-----------------------------

Taken one at a time those Clang features are small, but all at the same time are impressive. Hopefully Clang will poke GCC developers to improve the diagnostic features of their compiler.

I think some of those ideas are good for D too, so I'd like to open few D enhancement requests based on this post, but I have to think more on them. Suggestions and comments are welcome.

Bye,
bearophile


More information about the Digitalmars-d mailing list