Help optimize D solution to phone encoding problem: extremely slow performace.
Siarhei Siamashka
siarhei.siamashka at gmail.com
Sat Jan 20 12:01:28 UTC 2024
On Saturday, 20 January 2024 at 01:27:50 UTC, H. S. Teoh wrote:
> On Sat, Jan 20, 2024 at 01:35:44AM +0100, Daniel Kozak via
> Digitalmars-d-learn wrote: [...]
>> > Try addressing the points I wrote above and see if it
>> makes a
>> > difference.
>>
>> I have tried it (all of it) even before you wrote it here,
>> because
>> I have completely the same ideas, but to be fair it has
>> almost zero
>> effect on speed.
>> There is my version (It still use OOP, but I have try it wit
>> Printer and Counter to be structs and it has no effect at
>> all) [2]https://paste.ofcode.org/38vKWLS8DHRazpv6MTidRJY
>> The only difference in speed in the end is caused by hash
>> implementation of dlang associative arrays and rust HashMap,
>> actually if you modify rust to not used ahash it has almost
>> same
>> speed as D
> [...]
>
> I'm confused by the chained hashing of the digits. Why is that
> necessary? I would have thought it'd be faster to hash the
> entire key instead of computing the hash of each digit and
> chaining them together.
Hash which key? The problem of the naive hashtable-based
algorithm is that we don't know the length of the key. So all
lengths are tried starting from 1 and up to the remaining part of
the phone number. An additional optimization would be to limit
this search and terminate it early. For example, stop after we
exceed the length of the longest word from the dictionary. Or
stop the search upon encountering certain "leaf" words, but this
effectively morphs the algorithm into something that partially
resembles Trie. And further evolving it we may end up with a
full-fledged Trie.
The only practical value of this algorithm is that it's easy to
implement and has low LOC count. My initial non-optimized naive
version was also similar
https://gist.github.com/ssvb/abe821b3cdba7fcb7f3c3cecde864153 (66
lines including blank lines and some comments). This is somewhat
comparable to the Lisp results
https://flownet.com/ron/papers/lisp-java/raw-results.html or to
the Peter Norvig's solution http://www.norvig.com/java-lisp.html
The purpose of the initial implementation "prototype" is just to
have some sort of a starting point. And if the runtime
performance doesn't matter, then we are already done. But if the
runtime performance does matter, then a lot of things can be
rapidly improved by spending a bit of time on it. Eventually one
runs out of ideas and further optimization efforts yield only
diminishing returns. That's when it's a good time to clean up the
code, add comments, etc. and label it as the "final" version.
I think that a good study, that intends to compare different
programming languages against each other, could take all these
factors into consideration: time to implement the "prototype",
runtime performance of the prototype, time to implement the
"final" version, runtime performance of the final version, the
number of lines of code and its readability/maintainability.
More information about the Digitalmars-d-learn
mailing list