More bugs found in OS code

Wed Nov 2 20:08:58 PDT 2011

The now usual article that advertises the PVS-Studio tool that shows plenty of (depressing) bugs found in already debugged source code of widely used C/C++ open source projects:

http://software.intel.com/en-us/articles/90-errors-in-open-source-projects/

There is a Reddit discussion too, but I find it useless:
http://www.reddit.com/r/programming/comments/lxjrb/examples_of_errors_detected_in_various_opensource/

In the Reddit discussion someone is free to list in D how many of the 91 bugs:
- Are not applicable or are not normally done in D;
- Are statically caught by D/DMD;
- Are always caught at runtime by DMD in non release mode;
- Are planned/discussed to be statically avoided or statically caught;
- Are planned/discussed to be always caught at runtime by DMD in non release mode.

Here I list and comment about some of the more interesting parts. Below I have divided the problems in 5 groups (I have not included all the bug groups shown in the article).

A]========================================

Example 3. Fennec Media Project project. Complex expression.

uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) 
{
  ...
  while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
    Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {}
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 'm_nCurrentBitIndex' variable is modified while being used twice at single sequence point. MACLib unbitarrayold.cpp 78

------------------------

Example 4. Miranda IM project. Complex expression.

short ezxml_internal_dtd(ezxml_root_t root,
  char *s, size_t len)
{
  ...
  while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') {
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 's' variable is modified while being used twice between sequence points.msne zxml.c 371

------------------------

Currently DMD accepts this code:

x = x++;

Currently D is using the C semantics, this means the result of that line of code is undefined, this means you don't know what it will do once compiled by different D compilers or even with different optimization level on the same compiler. This means writing such code in C (and currently in D) is _always_ a bug (because a program with undefined semantics is often useless. There are other sources of undefined semantics, but the less there are, the better it is). I don't understand why C compilers don't refuse such code statically with an error. In my opinion this is not acceptable.

Two basic solutions for D:
1) Define exactly what that code does in D. Walter has expressed few times his desire for this solution.
2) Turn similar lines of code into compile-time errors (and maybe accept them forever if the -d switch is used, but I am not sure this is a good idea).

The first solution is good because it makes it simpler to port C code to D, it allows C/C++/Java programmers to use that code still. But such C code has undefined results, so I don't see a great advantage here (it's useful still to port Java code).

But lately I am slowly leaning toward the second solution, because even if D defines exactly the semantics of code like this, so it gives the same result on all D compilers:

(*(n = ++s + strspn(s, EZXML_WS)) && *n != '>')

I don't want to read similar code in D programs I have to debug or modify. Even if it's unambiguous for the D language, it's a bit too much hard for me to understand. Go language has chosen this second solution.

B]========================================

Curretly D2 accepts the usage of & and | among booleans:

void main() {
    bool a, b;
    auto r1 = a & b;
    static assert(is(typeof(r1) == bool));
    auto r2 = a | b;
    static assert(is(typeof(r2) == bool));
}

But I am not sure this is useful enough to justify the risks that gives.

Using booleans as integers is useful when I have to count them and in few other situations. But I don't remember the last time I've had to use bitwise or/bitwise and on boolean values, I think I have never had to do this. So maybe it's better to forbid this. The advantage of forbidding this operation is that it avoids several mistakes caused by operator precedence like (!x & y). Comments on this are welcome.

C]========================================

Example 5. IPP Samples project. Priorities of ?: and | operations.

vm_file* vm_file_fopen(...)
{
  ...
  mds[3] = FILE_ATTRIBUTE_NORMAL |
           (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING;
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. vm vm_file_win.c 393

------------------------

Example 6. Newton Game Dynamics project. Priorities of ?: and * operations.

dgInt32 CalculateConvexShapeIntersection (...)
{
  ...
  den = dgFloat32 (1.0e-24f) *
        (den > dgFloat32 (0.0f)) ?
          dgFloat32 (1.0f) : dgFloat32 (-1.0f);
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '*' operator. physics dgminkowskiconv.cpp 1061

D]========================================

Currently this is not caught by D, it prints "12":

import std.stdio;
void main() {
    writefln("%d%d", 1, 2, 3);
}

I'd like this code to give a compile-time error.

If this is not possible, then I'd like this code to give a run-time error.

The third possibility is to print "123", but this is bad enough.

Printing just "12" and ignoring 3, as done by DMD 2.056, is not acceptable, in my opinion.

E]========================================

Example 1. Shareaza project. Value range of char type.

void CRemote::Output(LPCTSTR pszName)
{

  ...
  CHAR* pBytes = new CHAR[ nBytes ];
  hFile.Read( pBytes, nBytes );
  ...
  if ( nBytes > 3 && pBytes[0] == 0xEF &&
       pBytes[1] == 0xBB && pBytes[2] == 0xBF )
  {
    pBytes += 3;
    nBytes -= 3;
    bBOM = true;
  }
  ...
}

The error was found through the V547 diagnostic: Expression 'pBytes [ 0 ] == 0xEF' is always false. The value range of signed char type: [-128, 127]. Shareaza remote.cpp 350

------------------------

Example 3. VirtualDub project. Unsigned type is always >= 0.

typedef unsigned short wint_t;
...
void lexungetc(wint_t c) {
  if (c < 0)
    return;
   g_backstack.push_back(c);
}

The error was found through the V547 diagnostic: Expression 'c < 0' is always false. Unsigned type value is never < 0. Ami lexer.cpp 225

The "c < 0" condition is always false because the variable of the unsigned type is always above or equal to 0.

------------------------

Example 7. QT project. Dangerous loop.

bool equals( class1* val1, class2* val2 ) const{
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

The error was found through the V547 diagnostic: Expression '--size >= 0' is always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154

The (--size >= 0) condition is always true, since the size variable has the unsigned type. It means that if two sequences being compared are alike, we will get an overflow that will in its turn cause Access Violation or other program failures.

------------------------

Example 8. MySQL project. Error in condition.

enum enum_mysql_timestamp_type
str_to_datetime(...)
{
  ...
  else if (str[0] != 'a' || str[0] != 'A')
    continue; /* Not AM/PM */
  ...
}

The error was found through the V547 diagnostic: Expression 'str [0] != 'a' || str [0] != 'A'' is always true. Probably the '&&' operator should be used here. clientlib my_time.c 340

The condition is always true because the character is always either not equal to 'a' or to 'A'. This is the correct check:

else if (str[0] != 'a' && str[0] != 'A')

Comments are welcome.

Bye,
bearophile