Opinions: The Best and Worst of D (for a lecture/talk I intend to give)

Wed Jul 9 11:29:06 PDT 2014

On Wed, Jul 09, 2014 at 05:43:15PM +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via
> Digitalmars-d-learn wrote:
[..]
> >Yeah, I don't see what's the problem with comparing signed and unsigned
> >values, as long as the result is as expected. Currently, however, this
> >code asserts, which is wrong:
> >
> >	uint x = uint.max;
> >	int y = -1;
> >	assert(x > y);
>
> Yes, this is really bad.
> But last time I got the response that this is so to be compatible with
> C.  That is what I really thought was the reason why D throw away
> balast from C, to fix bugs.

I think the slogan was that if something in D looks like C, then it
should either have C semantics or not compile. According to this logic,
the only recourse here is to prohibit comparison of signed with
unsigned, which I don't think is workable because there are many valid
use cases for it (plus, it will break a ton of code and people will be
very unhappy).

I don't like the current behaviour, though. It just reeks of wrong in so
many different ways. If you *really* want semantives like the above, you
really should be casting y to unsigned so that it's clear what exactly
you're trying to achieve.

[...]
> >Hmm. I wonder if there's a more efficient way to do this.
> I'm sure. But I think it should be done at the compiler, not in a
> library.

Obviously, yes. But I wasn't thinking about implementing opCmp in the
library -- that would be strange since ints, of all things, need to have
native compiler support. I was thinking more about how the compiler
would implement "safe" signed/unsigned comparisons.

[...]
> >So I submit that the unbranched version is better. ;-)
> I don't think so, because the branch will only be taken if the signed
> type is >= 0 (in fact unsigned). So if the signed/unsigned comparison
> is by accident, you pay the extra runtime. But if it is intentional
> the signed value is likely to be negative, so you get a correct result
> with no extra cost.

Good point. Moreover, I have discovered multiple bugs in my proposed
implementation; the correct implementation should be as follows:

	int compare(int x, uint y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);

		// The (x|y) & signbitMask basically means that if either x is negative
		// or y > int.max, then the result will always be negative (sign bit
		// set).
		return (cast(uint)x - y) | ((x | y) & signbitMask);
	}

	unittest
	{
		// Basic cases
		assert(compare(5, 10u) < 0);
		assert(compare(5, 5u) == 0);
		assert(compare(10, 5u) > 0);

		// Large cases
		assert(compare(0, uint.max) < 0);
		assert(compare(50, uint.max) < 0);

		// Sign-dependent cases
		assert(compare(-1, 0u) < 0);
		assert(compare(-1, 10u) < 0);
		assert(compare(-1, uint.max) < 0);
	}

	int compare(uint x, int y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);
		return ((x - y) & ~(x & signbitMask)) | ((cast(uint)y & signbitMask) >> 1);
	}

	unittest
	{
		// Basic cases
		assert(compare(0u, 10) < 0);
		assert(compare(10u, 10) == 0);
		assert(compare(10u, 5) > 0);

		// Large cases
		assert(compare(uint.max, 10) > 0);
		assert(compare(uint.max, -10) > 0);

		// Sign-dependent cases
		assert(compare(0u, -1) > 0);
		assert(compare(10u, -1) > 0);
		assert(compare(uint.max, -1) > 0);
	}

Using gdc -O3, I managed to get a very good result for
compare(int,uint), only 5 instructions long.

However, for compare(uint,int), there is the annoying special case of
compare(uint.max, -1), which can only be fixed by the hack
... | ((y & signbitMask) >> 1).  Unfortunately, this makes it 11
instructions long, which is unacceptable. So it looks like a simple
compare and branch would be far better in the compare(uint,int) case --
it's far more costly to avoid the branch than to live with it.

> Even better for constants, where the compiler can not only evaluate
> expressions like (uint.max > -1) correct, but it should optimize them
> completely away!

Actually, with gdc -O3, I found that the body of the above unittests got
completely optimized away at compile-time, so that the unittest body is
empty in the executable! So even with a library implementation the
compiler was able to maximize performance. DMD left the assert calls in,
but then it's not exactly known for generating optimal code anyway.

> >(So much for premature optimization... now lemme go and actually
> >benchmark this stuff and see how well it actually performs in
> >practice.
> Yes, we should do this.
>
> >Often, such kinds of hacks often perform more poorly than expected
> >due to unforeseen complications with today's complex CPU's. So for
> >all I know, I could've just been spouting nonsense above. :P)
> I don't see such a compiler change as a hack. It is a strong
> improvement IMHO.

I was talking about using | and & to get rid of the branch in
signed/unsigned comparison. As it turns out, the compare(uint,int) case
seems far more costly than a simple compare-and-branch as you had it at
the beginning. So at least that part of what I wrote is probably
nonsense.  :P

But I can't say for sure until I actually run some benchmarks on it.

T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."