Opinions: The Best and Worst of D (for a lecture/talk I intend to give)

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Wed Jul 9 12:53:08 PDT 2014


On Wed, Jul 09, 2014 at 11:29:06AM -0700, H. S. Teoh via Digitalmars-d-learn wrote:
> On Wed, Jul 09, 2014 at 05:43:15PM +0000, Dominikus Dittes Scherkl via Digitalmars-d-learn wrote:
> > On Wednesday, 9 July 2014 at 17:13:21 UTC, H. S. Teoh via
> > Digitalmars-d-learn wrote:
[...]
> > >Often, such kinds of hacks often perform more poorly than expected
> > >due to unforeseen complications with today's complex CPU's. So for
> > >all I know, I could've just been spouting nonsense above. :P)
> > I don't see such a compiler change as a hack. It is a strong
> > improvement IMHO.
> 
> I was talking about using | and & to get rid of the branch in
> signed/unsigned comparison. As it turns out, the compare(uint,int)
> case seems far more costly than a simple compare-and-branch as you had
> it at the beginning. So at least that part of what I wrote is probably
> nonsense.  :P
> 
> But I can't say for sure until I actually run some benchmarks on it.
[...]

Hmph. I'm having trouble coming up with a fair benchmark, because I
realized that D doesn't actually have a way of expressing opCmp for
unsigned int's in a minimal way! The problem is that the function needs
to return int, but given two uints, their difference may be greater than
int.max, so simply subtracting them will not work. So the best I can
come up with is:

	int compare2(int x, uint y)
	{
		return (x < 0) ? -1 :
			(y > int.max) ? -1 :
			(x - y);
	}

which requires 2 comparisons.

Similarly, for the uint-int case:

	int compare2(uint x, int y)
	{
		return (y < 0) ? 1 :
			(x > int.max) ? 1 :
			(x - y);
	}

If you have a better implementation in mind, I'm all ears.

In any case, I went ahead and benchmarked the above two functions along
with my branchless implementations, and here are the results:

(with dmd -O -unittest:)

	non-branching compare(signed,unsigned): 5513 msecs
	branching compare(signed,unsigned): 5442 msecs
	non-branching compare(unsigned,signed): 5441 msecs
	branching compare(unsigned,signed): 5744 msecs
	Optimizer-thwarting value: 0

(with gdc -O3 -funittest:)

	non-branching compare(signed,unsigned): 516 msecs
	branching compare(signed,unsigned): 1209 msecs
	non-branching compare(unsigned,signed): 453 msecs
	branching compare(unsigned,signed): 756 msecs
	Optimizer-thwarting value: 0

(Ignore the last lines of each output; that's just a way to prevent gdc
-O3 from being over-eager and optimizing out the entire test so that
everything returns 0 msecs.)

Interestingly, with dmd, the branching compare for the signed-unsigned
case is faster than my non-branching one, but the order is reversed for
the unsigned-signed case. They're pretty close, though, and on some runs
the order of the latter case is reversed.

With gdc, however, it seem the non-branching versions are clearly
better, even in the unsigned-signed case, which I thought would be
inferior. So clearly, these results are very optimizer-dependent.

Keep in mind, though, that this may not necessarily reflect actual
performance when the compiler generates the equivalent code for the
built-in integer comparison operators, because in codegen the compiler
can take advantage of the CPU's carry and overflow bits, and can elide
the actual return values of opCmp. This may skew the results enough to
reverse the order of some of these cases.

Anyway, here's the code (for independent verification):

	int compare(int x, uint y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);
	
		// The (x|y) & signbitMask basically means that if either x is negative
		// or y > int.max, then the result will always be negative (sign bit
		// set).
		return (cast(uint)x - y) | ((x | y) & signbitMask);
	}
	
	unittest
	{
		// Basic cases
		assert(compare(5, 10u) < 0);
		assert(compare(5, 5u) == 0);
		assert(compare(10, 5u) > 0);
	
		// Large cases
		assert(compare(0, uint.max) < 0);
		assert(compare(50, uint.max) < 0);
	
		// Sign-dependent cases
		assert(compare(-1, 0u) < 0);
		assert(compare(-1, 10u) < 0);
		assert(compare(-1, uint.max) < 0);
	}
	
	int compare2(int x, uint y)
	{
		return (x < 0) ? -1 :
			(y > int.max) ? -1 :
			x - y;
	}
	
	unittest
	{
		// Basic cases
		assert(compare2(5, 10u) < 0);
		assert(compare2(5, 5u) == 0);
		assert(compare2(10, 5u) > 0);
	
		// Large cases
		assert(compare2(0, uint.max) < 0);
		assert(compare2(50, uint.max) < 0);
	
		// Sign-dependent cases
		assert(compare2(-1, 0u) < 0);
		assert(compare2(-1, 10u) < 0);
		assert(compare2(-1, uint.max) < 0);
	}
	
	int compare(uint x, int y)
	{
		enum signbitMask = 1u << (int.sizeof*8 - 1);
		static assert(signbitMask == 0x80000000);
		return ((x - y) & ~(x & signbitMask)) | ((cast(uint)y & signbitMask) >> 1);
	}
	
	unittest
	{
		// Basic cases
		assert(compare(0u, 10) < 0);
		assert(compare(10u, 10) == 0);
		assert(compare(10u, 5) > 0);
	
		// Large cases
		assert(compare(uint.max, 10) > 0);
		assert(compare(uint.max, -10) > 0);
	
		// Sign-dependent cases
		assert(compare(0u, -1) > 0);
		assert(compare(10u, -1) > 0);
		assert(compare(uint.max, -1) > 0);
	}
	
	int compare2(uint x, int y)
	{
		return (y < 0) ? 1 :
			(x > int.max) ? 1 :
			x - y;
	}
	
	unittest
	{
		// Basic cases
		assert(compare2(0u, 10) < 0);
		assert(compare2(10u, 10) == 0);
		assert(compare2(10u, 5) > 0);
	
		// Large cases
		assert(compare2(uint.max, 10) > 0);
		assert(compare2(uint.max, -10) > 0);
	
		// Sign-dependent cases
		assert(compare2(0u, -1) > 0);
		assert(compare2(10u, -1) > 0);
		assert(compare2(uint.max, -1) > 0);
	}
	
	void main()
	{
		import std.datetime;
		import std.stdio : writefln;
	
		enum numRepeat = 5;
	
		int dummy; // deoptimizer
	
		auto nobranch_su = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare(s, u);
			}
		});
		writefln("non-branching compare(signed,unsigned): %d msecs",
			nobranch_su[0].msecs);
	
		auto branch_su = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare2(s, u);
			}
		});
		writefln("branching compare(signed,unsigned): %d msecs",
			branch_su[0].msecs);
	
		auto nobranch_us = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare(u, s);
			}
		});
		writefln("non-branching compare(unsigned,signed): %d msecs",
			nobranch_us[0].msecs);
	
		auto branch_us = numRepeat.benchmark!(()
		{
			foreach (s; -10_000 .. 10_000)
			{
				foreach (uint u; 0 .. 20_000)
					dummy ^= compare2(u, s);
			}
		});
		writefln("branching compare(unsigned,signed): %d msecs",
			branch_us[0].msecs);
	
		writefln("Optimizer-thwarting value: %d", dummy);
	}


T

-- 
The best way to destroy a cause is to defend it poorly.


More information about the Digitalmars-d-learn mailing list