Opinions: The Best and Worst of D (for a lecture/talk I intend to give)

Wed Jul 9 10:11:44 PDT 2014

On Wed, Jul 09, 2014 at 04:24:38PM +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> On Wednesday, 9 July 2014 at 14:51:41 UTC, Meta wrote:
> >One of the uglier things in D is also a long-standing problem with C
> >and C++, in that comparison of signed and unsigned values is allowed.
> 
> I would like that, if it would be implemented along this line:
> 
> /// Returns -1 if a < b, 0 if they are equal or 1 if a > b.
> /// this will always yield a correct result, no matter which numeric types are compared.
> /// It uses one extra comparison operation if and only if
> /// one type is signed and the other unsigned but the signed value is >= 0
> /// (that is what you need to pay for stupid choice of type).
[...]

Yeah, I don't see what's the problem with comparing signed and unsigned
values, as long as the result is as expected. Currently, however, this
code asserts, which is wrong:

	uint x = uint.max;
	int y = -1;
	assert(x > y);

> {
>    static if(Unqual!T == Unqual!U)

Nitpick: should be:

    static if(is(Unqual!T == Unqual!U))

[...]
>    else static if(isSigned!T && isUnsigned!U)
>    {
>       alias CommonType!(Unsigned!T, U) C;
>       return (a < 0) ? -1 : opCmp!(cast(C)a, cast(C)b);
>    }
>    else static if(isUnsigned!T && isSigned!U)
>    {
>       alias CommonType!(T, Unsigned!U) C;
>       return (b < 0) ? 1 : opCmp!(cast(C)a, cast(C)b);
>    }
[...]

Hmm. I wonder if there's a more efficient way to do this.

For the comparison s < u, where s is a signed value and u is an unsigned
value, whenever s is negative, the return value of opCmp must be
negative.  Assuming 2's-complement representation of integers, this
means we simply copy the MSB of s (i.e., the sign bit) to the result. So
we can implement s < u as:

	enum signbitMask = 1u << (s.sizeof*8 - 1); // this is a compile-time constant
	return (s - u) | (s & signbitMask); // look ma, no branches!

which would translate (roughly) to the assembly code:

	mov	eax, [<address of s>]
	mov	ebx, [<address of u>]
	mov	ecx, eax	; save the value of s for signbit extraction
	sub	eax, ebx	; s - u
	and	ecx, #$10000000	; s & signbitMask
	or	eax, ecx	; (s - u) | (s & signbitMask)
	(ret		; this is deleted if opCmp is inlined)

which avoids a branch hazard in the CPU pipeline.

Similarly, for the comparison u < s, whenever s is negative, then opCmp
must always be positive. So this means we copy over the *negation* of
the sign bit of s to the result. So we get this for u < s:

	enum signbitMask = 1u << (s.sizeof*8 - 1); // as before
	return (u - s) & ~(s & signbitMask); // look ma, no branches!

which translates roughly to:

	mov	eax, [<address of u>]
	mov	ebx, [<address of s>]
	sub	eax, ebx	; u - s
	and	ebx, #$10000000	; s & signbitMask
	not	ebx		; ~(s & signbitMask)
	and	eax, ebx	; (u - s) & ~(s & signbitMask)
	(ret		; this is deleted if opCmp is inlined)

Again, this avoid a branch hazard in the CPU pipeline.

In both cases, the first 2 instructions are unnecessary if the values to
be compared are already in CPU registers. The naïve implementation of
opCmp is just a single sub instruction (this is why opCmp is defined the
way it is, BTW), whereas the "smart" signed/unsigned comparison is 4
instructions long. The branched version would look something like this:

	mov	eax, [<address of u>]
	mov	ebx, [<address of s>]
	cmp	ebx, $#0
	jge	label1		; first branch
	mov	eax, $#FFFFFFFF
	jmp	label2		; 2nd branch
label1:
	sub	eax, ebx
label2:
	(ret)

The 2nd branch can be replaced with ret if opCmp is not inlined, but
requiring a function call to compare integers seems excessive, so let's
assume it's inlined, in which case the 2nd branch is necessary. So as
you can see, the branched version is 5 instructions long, and always
causes a CPU pipeline hazard.

So I submit that the unbranched version is better. ;-)

(So much for premature optimization... now lemme go and actually
benchmark this stuff and see how well it actually performs in practice.
Often, such kinds of hacks often perform more poorly than expected due
to unforeseen complications with today's complex CPU's. So for all I
know, I could've just been spouting nonsense above. :P)

T

-- 
Debian GNU/Linux: Cray on your desktop.