Can this implementation of Damm algorithm be optimized?

Thu Feb 9 13:43:08 PST 2017

Dne 9.2.2017 v 22:29 Nestor via Digitalmars-d-learn napsal(a):

> On Thursday, 9 February 2017 at 20:46:06 UTC, Daniel Kozak wrote:
>> Maybe you can try use static array instead of dynamic
>> static immutable ubyte[10][10] QG10Matrix = ...
>
> I shaved it to this to discard unneccessary time-consuming functions:
>
> static immutable ubyte[10][10] QG10Matrix = [
>   [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
>   [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
>   [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
>   [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
>   [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
> ];
>
> static immutable string number =
> "0123456789012345678901234567890123456789012345678901234567890123456789";
>
> static int charToInt(char chr) {
>   return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
> }
>
> ubyte checkDigit(string str) {
>   ubyte tmpdigit;
>   foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
>   return tmpdigit;
> }
>
> void main() {
>   auto digit = checkDigit(number);
> }
>
> I even tried making checkDigit static, but surprisingly this increased 
> average execution time by 1ms.
>
> Anyway, the previopus version (modified to benefit from a little 
> optimization too) is almost as performant, even though it includes 
> multiplication and sum:
>
> static immutable char[] QG10Matrix =
>   "03175986427092154863420687135917509834266123045978" ~
>   "36742095815869720134894536201794386172052581436790";
>
> static immutable string number =
> "0123456789012345678901234567890123456789012345678901234567890123456789";
>
> static int charToInt(char chr) {
>   return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
> }
>
> char checkDigit(string str) {
>   char tmpdigit = '0';
>   foreach(chr; str) tmpdigit =
>     QG10Matrix[charToInt(chr) + (charToInt(tmpdigit) * 10)];
>   return tmpdigit;
> }
>
> void main() {
>   auto digit = checkDigit(number);
> }
>
> I compiled both with -inline -noboundscheck -release and the 
> multidimensional array version does perform 1ms faster for a couple 
> hundred runs, but I expected the difference to be much more noticeable.
>
> Any idea of what might be happening here?
Did you try it with different backends? llvm (ldc), gcc(gdc)?