Strange counter-performance in an alternative `decimalLength9` function
Basile B.
b2.temp at gmx.com
Wed Feb 26 00:50:35 UTC 2020
So after reading the translation of RYU I was interested too see
if the decimalLength() function can be written to be faster, as
it cascades up to 8 CMP.
After struggling with bad ideas I finally find something that
looks nice:
- count the leading zero of the input
- switch() this count to have in the worst of the cases only 1
CMP (when the bit number possibly changes the digit count, e.g 9
-> 10 (when bsr(input) == 4)
after writing all the values allowing to know the cases where a
comparison is necessary...
min2 = 0b10
max2 = 0b11
min3 = 0b100
max3 = 0b111
...
...
min32 = 0b100.......0
max32 = 0b111.......1
...I finally write the "nice" thing.
---
#!dmd -boundscheck=off -O -release -inline
import std.stdio;
ubyte decimalLength9(const uint v)
{
if (v >= 100000000) { return 9; }
if (v >= 10000000) { return 8; }
if (v >= 1000000) { return 7; }
if (v >= 100000) { return 6; }
if (v >= 10000) { return 5; }
if (v >= 1000) { return 4; }
if (v >= 100) { return 3; }
if (v >= 10) { return 2; }
return 1;
}
ubyte fdecimalLength9(const uint v) pure nothrow
{
import core.bitop;
const ubyte lzc = cast(ubyte) (bsr(v) + 1);
ubyte result;
switch (lzc)
{
case 0 :
case 1 :
case 2 :
case 3 : result = 1; break;
case 4 : result = v >= 10 ? 2 : 1; break;
case 5 :
case 6 : result = 2; break;
case 7 : result = v >= 100 ? 3 : 2; break;
case 8 :
case 9 : result = 3; break;
case 10: result = v >= 1000 ? 4 : 3; break;
case 11:
case 12:
case 13: result = 4; break;
case 14: result = v >= 10000 ? 5 : 4; break;
case 15:
case 16: result = 5; break;
case 17: result = v >= 100000 ? 6 : 5; break;
case 18:
case 19: result = 6; break;
case 20: result = v >= 1000000 ? 7 : 6; break;
case 21:
case 22:
case 23: result = 7; break;
case 24: result = v >= 10000000 ? 8 : 7; break;
case 25:
case 26: result = 8; break;
case 27: result = v >= 100000000 ? 9 : 8; break;
case 28:
case 29:
case 30:
case 31:
case 32: result = 9; break;
default: assert(false);
}
return result;
}
private ubyte ffdecimalLength9(const uint v) pure nothrow
{
import core.bitop;
const ubyte lzc = cast(ubyte) (bsr(v) + 1);
static immutable pure nothrow ubyte function(uint)[33] tbl =
[
0 : (uint a) => ubyte(1),
1 : (uint a) => ubyte(1),
2 : (uint a) => ubyte(1),
3 : (uint a) => ubyte(1),
4 : (uint a) => a >= 10 ? ubyte(2) : ubyte(1),
5 : (uint a) => ubyte(2),
6 : (uint a) => ubyte(2),
7 : (uint a) => a >= 100 ? ubyte(3) : ubyte(2),
8 : (uint a) => ubyte(3),
9 : (uint a) => ubyte(3),
10: (uint a) => a >= 1000 ? ubyte(4) : ubyte(3),
11: (uint a) => ubyte(4),
12: (uint a) => ubyte(4),
13: (uint a) => ubyte(4),
14: (uint a) => a >= 10000 ? ubyte(5) : ubyte(4),
15: (uint a) => ubyte(5),
16: (uint a) => ubyte(5),
17: (uint a) => a >= 100000 ? ubyte(6) : ubyte(5),
18: (uint a) => ubyte(6),
19: (uint a) => ubyte(6),
20: (uint a) => a >= 1000000 ? ubyte(7) : ubyte(6),
21: (uint a) => ubyte(7),
22: (uint a) => ubyte(7),
23: (uint a) => ubyte(7),
24: (uint a) => a >= 10000000 ? ubyte(8) : ubyte(7),
25: (uint a) => ubyte(8),
26: (uint a) => ubyte(8),
27: (uint a) => a >= 100000000 ? ubyte(9) : ubyte(8),
28: (uint a) => ubyte(9),
29: (uint a) => ubyte(9),
30: (uint a) => ubyte(9),
31: (uint a) => ubyte(9),
32: (uint a) => ubyte(9),
];
return tbl[lzc](v);
}
void main()
{
import std.datetime.stopwatch, std.range, std.algorithm;
int s1, s2, s3;
benchmark!({ iota(100000000u).each!(a => s1 +=
decimalLength9(a+1)); })(10).writeln;
benchmark!({ iota(100000000u).each!(a => s2 +=
fdecimalLength9(a+1));})(10).writeln;
benchmark!({ iota(100000000u).each!(a => s3 +=
ffdecimalLength9(a+1));})(10).writeln;
assert(s1 == s2);
assert(s1 == s3);
}
---
Then bad surprise. Even with ldmd (so ldc2 basically) feeded with
the args from the script line. Maybe the fdecimalLength9 version
is slightly faster. Only *slightly*. Good news, I've lost my
time. So I try an alternative version that uses a table of
delegates instead of a switch (ffdecimalLength9) and surprise,
"tada", it is like **100x** slower then the two others.
How is that possible ?
More information about the Digitalmars-d-learn
mailing list