Strange counter-performance in an alternative `decimalLength9` function

Thu Feb 27 09:33:28 UTC 2020

On Wednesday, 26 February 2020 at 00:50:35 UTC, Basile B. wrote:
> So after reading the translation of RYU I was interested too 
> see if the decimalLength() function can be written to be 
> faster, as it cascades up to 8 CMP.

Perhaps you could try something like this.

<code>
int decimalDigitLength(ulong n) {
	if (n < 10000)
		if (n < 100)
			return n < 10 ? 1 : 2;
		else
			return n < 1000 ? 3 : 4;
	else
		if (n < 100000000)
			if (n < 1000000)
				return n < 100000 ? 5 : 6;
			else
				return n < 10000000 ? 7 : 8;
		else
			if (n < 1000000000000)
				if (n < 10000000000)
					return n < 1000000000 ? 9 : 10;
				else
					return n < 100000000000 ? 11 : 12;
			else
				if (n < 10000000000000000)
					if (n < 100000000000000)
						return n < 10000000000000 ? 13 : 14;
					else
						return n < 1000000000000000 ? 15 : 16;
				else
					if (n < 1000000000000000000)
						return n < 100000000000000000 ? 17 : 18;
					else
						return n < 10000000000000000000 ? 19 : 20;
}										
</code>

This uses at most 6 compares for any 64 bit number and only 3 for 
the most common small numbers less than 10000.

I was glad to see that with ldc at run.dlang.io using the -O3 
optimization I could change the function signature to match yours 
and the compiler eliminated all the unreachable dead code for 
larger values. The compiler produced the following assembler

<code>
	.section	.text.ubyte 
onlineapp.decimalLength9(uint),"axG", at progbits,ubyte 
onlineapp.decimalLength9(uint),comdat
	.globl	ubyte onlineapp.decimalLength9(uint)
	.p2align	4, 0x90
	.type	ubyte onlineapp.decimalLength9(uint), at function
ubyte onlineapp.decimalLength9(uint):
	.cfi_startproc
	cmpl	$9999, %edi
	ja	.LBB1_5
	cmpl	$99, %edi
	ja	.LBB1_4
	cmpl	$10, %edi
	movb	$2, %al
	sbbb	$0, %al
	retq
.LBB1_5:
	cmpl	$99999999, %edi
	ja	.LBB1_9
	cmpl	$999999, %edi
	ja	.LBB1_8
	cmpl	$100000, %edi
	movb	$6, %al
	sbbb	$0, %al
	retq
.LBB1_4:
	cmpl	$1000, %edi
	movb	$4, %al
	sbbb	$0, %al
	retq
.LBB1_9:
	cmpl	$1000000000, %edi
	movb	$10, %al
	sbbb	$0, %al
	retq
.LBB1_8:
	cmpl	$10000000, %edi
	movb	$8, %al
	sbbb	$0, %al
	retq
.Lfunc_end1:
	.size	ubyte onlineapp.decimalLength9(uint), .Lfunc_end1-ubyte 
onlineapp.decimalLength9(uint)
	.cfi_endproc
</code>

for the same body with signature ubyte decimalLength9(uint n).

This may be faster than your sequential comparison function 
depending upon the distribution of numbers. In real applications, 
small numbers are far more common so the reduced number of 
compares for those values should be beneficial in most cases.