Can this implementation of Damm algorithm be optimized?
Cym13 via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Thu Feb 9 12:56:15 PST 2017
On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:
> On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow
> wrote:
>> On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
>>> I was trying to port C code from the article in Wikiversity
>>> [1] to D, but I'm not sure this implementation is the most
>>> efficient way to do it in D, so suggestions to optimize it
>>> are welcome:
>>>
>>> import std.stdio;
>>>
>>> static immutable char[] QG10Matrix =
>>> "03175986427092154863420687135917509834266123045978" ~
>>> "36742095815869720134894536201794386172052581436790";
>>>
>>> char checkDigit(string str) {
>>> char tmpdigit = '0';
>>> foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
>>> (tmpdigit - '0') * 10];
>>> return tmpdigit;
>>> }
>>
>> Well one thing is you can probably reduce them from chars to
>> just bytes, instead of having to subtract you can instead add
>> at the end. Although unless you're working with a VERY large
>> input you won't see a difference.
>>
>> Actually since you're also multiplying by 10, you can
>> incorporate that in the table too... (although a mixin might
>> be better for the conversion than by hand)
>>
>>
>> static immutable char[] QG10Matrix = [
>> 00,30,10,70,50,90,80,60,40,20,
>> 70,00,90,20,10,50,40,80,60,30,
>> 40,20,00,60,80,70,10,30,50,90,
>> 10,70,50,00,90,80,30,40,20,60,
>> 60,10,20,30,00,40,50,90,70,80,
>> 30,60,70,40,20,00,90,50,80,10,
>> 50,80,60,90,70,20,00,10,30,40,
>> 80,90,40,50,30,60,20,00,10,70,
>> 90,40,30,80,60,10,70,20,00,50,
>> 20,50,80,10,40,30,60,70,90,00];
>>
>> char checkDigit(string str) {
>> char tmpdigit = 0;
>> foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
>> tmpdigit];
>> return (tmpdigit/10) + '0';
>> }
>
>
>
> OK I changed the approach using a multidimensional array for
> the matrix so I could ditch arithmetic operations altogether,
> but curiously after measuring a few thousand runs of both
> implementations through avgtime, I see no noticeable
> difference. Why?
>
> import std.stdio;
>
> static immutable ubyte[][] QG10Matrix = [
> [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
> [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
> [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
> [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
> [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
> ];
>
> static int charToInt(char chr) {
> scope(failure) return -1;
> return cast(int)(chr - '0');
> }
>
> ubyte checkDigit(string str) {
> ubyte tmpdigit;
> foreach(chr; str) tmpdigit =
> QG10Matrix[tmpdigit][charToInt(chr)];
> return tmpdigit;
> }
>
> enum {
> EXIT_SUCCESS = 0,
> EXIT_FAILURE = 1,
> }
>
> int main(string[] args) {
> scope(failure) {
> writeln("Invalid arguments. You must pass a number.");
> return EXIT_FAILURE;
> }
> assert(args.length == 2);
> ubyte digit = checkDigit(args[1]);
> if(digit == 0) writefln("%s is a valid number.", args[1]);
> else {
> writefln("%s is not a valid number (but it would be,
> appending digit %s).",
> args[1], digit);
> }
>
> return EXIT_SUCCESS;
> }
The question is, why do you expect it to be noticably faster? On
modern hardware it the optimizations are such that so little a
change in code is very hard to link to a difference in running
time. If you really want to show one the following code does the
benchmark the right way (ie: using a very long input, tens of
thousands of runs, and avoiding I/O and load time of the program
to compare the bare function implementations):
import std.conv;
import std.stdio;
import std.range;
import std.array;
import std.algorithm;
string testcase;
static immutable char[] QG10MatrixOne =
"03175986427092154863420687135917509834266123045978" ~
"36742095815869720134894536201794386172052581436790";
char checkDigitOne(string str) {
char tmpdigit = 0;
foreach(chr; str) tmpdigit = QG10MatrixOne[(chr - '0') +
(tmpdigit - '0') * 10];
return tmpdigit;
}
void testCheckDigitOne() {
checkDigitTwo(testcase);
}
static immutable ubyte[][] QG10MatrixTwo = [
[0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
[4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
[6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
[5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
[9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];
static int charToInt(char chr) {
scope(failure) return -1;
return cast(int)(chr - '0');
}
ubyte checkDigitTwo(string str) {
ubyte tmpdigit;
foreach(chr; str) {
tmpdigit = QG10MatrixTwo[tmpdigit][charToInt(chr)];
}
return tmpdigit;
}
void testCheckDigitTwo() {
checkDigitTwo(testcase);
}
void main(string[] args) {
testcase =
iota(10).cycle.take(100000).map!(to!string).array.join;
import std.datetime: benchmark;
benchmark!(testCheckDigitOne,
testCheckDigitTwo)(10000).each!writeln;
}
Yet on my machine I have the following times (note that the time
itself depends on my hardware, what's important is the
difference):
TickDuration(15785255852)
TickDuration(15784826803)
So it seems that the first version is slower than the second one,
but by so little that it's hard to link to the actual
implementation. If anything it shows the futility of such
low-level tricks for meaningless optimization. And by the way
note that the benchmark avoid measuring IO because we were
interested in the functions. But if we were measuring it it would
represent 4 fifth of the running time (on my machine, with
numbers as long as the one used in the benchmark). This means
even a 20% improvement in checkDigit would actually only
represent a 4% improvement of the program.
More information about the Digitalmars-d-learn
mailing list