Serious problem with opCmp

kov_serg kov_serg at freemail.ru
Sun Mar 2 11:37:04 PST 2008


I am starting to learn D language. And first thing I can not still understand is how "opCmp" works indeed?

If we look in documentation we find: define opCmp(S) or opCmp(S*) and it will works

int opCmp(Object o); 
  Compare with another Object obj. 
Returns:
  this < obj   < 0 
  this == obj    0 
  this > obj   > 0 

Here is a test code based on D documentation example from "array.html" 

/* http://www.digitalmars.com/d/2.0/arrays.html

Using Structs or Unions as the KeyType

Structs or unions can be used as the KeyType. For this to work, the struct or union definition must define the following member functions:

    * hash_t toHash()
    * int opEquals(S) or int opEquals(S*)
    * int opCmp(S) or int opCmp(S*)

Note that the parameter to opCmp and opEquals can be either the struct or union type, or a pointer to the struct or untion type.

For example:
*/
struct S
{
    int a, b;

    hash_t toHash() { return a + b; }

    int opEquals(S s)
    {
	return a == s.a && b == s.b;
    }

    int opCmp(S* s)
    {
	if (a == s.a)
	    return b - s.b;
	return a - s.a;
    }
}
/*
The implementation may use either opEquals or opCmp or both. Care should be taken so that the results of opEquals and opCmp are consistent with each other when the struct/union objects are the same or not.
*/

// define same struct but with opCmp(T) oposite to opCmp(T*)
struct SS
{
    int a, b;

    hash_t toHash() { return a + b; }

    int opEquals(SS s)
    {
	return a == s.a && b == s.b;
    }

    int opCmp(SS s)
    {
	if (a == s.a)
	    return b - s.b;
	return a - s.a;
    }
}

void chk(S *a, S *b) {
	int  z=a.opCmp(b);
	char c0=z<0?'<': z>0?'>' : '=';
	char cc=a<b?'<': a>b?'>' : '=';
	printf("%11d %c %11d (%11d%c0)\n",a.b,cc,b.b,z,c0);
}

void chk(SS a, SS b) {
	int  z=a.opCmp(b);
	char c0=z<0?'<': z>0?'>' : '=';
	char cc=a<b?'<': a>b?'>' : '=';
	printf("%11d %c %11d (%11d%c0)\n",a.b,cc,b.b,z,c0);
}

// let's test that will be
int main(char[][] args) {
    printf("opCmp(T)\n");
	SS ss1; ss1.a=0; ss1.b=int.min; 
	SS ss2; ss2.a=0; ss2.b=int.max; 
	chk(ss1,ss2);
	chk(ss2,ss1);

    printf("opCmp(T*) -- why it works this way?\n");
	S s1; s1.a=0; s1.b=int.min; 
	S s2; s2.a=0; s2.b=int.max; 
	chk(&s1,&s2);
	chk(&s2,&s1);

	return +1;
}

I am expecting the same behaviour but I was surprized:
# dmd -O -inline -release main.d
# Digital Mars D Compiler v2.010
# Copyright (c) 1999-2008 by Digital Mars written by Walter Bright
# Documentation: http://www.digitalmars.com/d/index.html
# ...
This code outputs:
opCmp(T)
-2147483648 >  2147483647 (          1>0)
 2147483647 < -2147483648 (         -1<0)
opCmp(T*) -- why it works this way?
-2147483648 <  2147483647 (          1>0)
 2147483647 > -2147483648 (         -1<0)

First 3 lines show predictable result and I expect this error.
But second 3 lines are unexpected to me. Surprize. They works without error. Is this a some kind of fixup?

This is well known bug with compare functions:
int cmp_a(int a,int b) {
	if (a<b) return -1;
	if (b<a) return +1;
	return 0;
}
int cmp_b(int a,int b) {
	return a-b;
}

function cmp_b holds hidden bug. It caused by arithmetic overflow. 
This type of compare function may used only in assembler code where we have FLAGS of operation done. 
But in high level language the only possible code is cmp_a.
If looks on input values plane and separete on 2 areas A and B

<pre>[code]
int.min
+--------0--------+--> a
|0    B  : \    A |
|  0     :   \    |
|    0   :   B \  |
| B    0 :       \|
0........0........0 zero
|\       : 0    B |
|  \ B   :   0    |
|    \   :     0  |
| A    \ :  B    0|
+--------0--------+ int.max
|               
v b
[/code]</pre>

Will see that:
function cmp_a will be correct with all acceptable values a and b.
function cmb_b will correct only in area B, and will return oposite result for area A.

It's extremly strange D shows 2 different behaviours simultaneously.
I think we should not hide this problem by some kind of fixup. It may couse serious problems in future.
More over it should be removed from examples and changed on predictable code. Because existing code contains hidden problem and undocumented workaround.
Anyway we must do something with this.




More information about the Digitalmars-d mailing list