Help optimize D solution to phone encoding problem: extremely slow performace.

Sat Jan 20 12:01:28 UTC 2024

On Saturday, 20 January 2024 at 01:27:50 UTC, H. S. Teoh wrote:
> On Sat, Jan 20, 2024 at 01:35:44AM +0100, Daniel Kozak via 
> Digitalmars-d-learn wrote: [...]
>>    > Try addressing the points I wrote above and see if it 
>> makes a
>>    > difference.
>> 
>>    I have tried it (all of it) even before you wrote it here, 
>> because
>>    I have completely the same ideas, but to be fair it has 
>> almost zero
>>    effect on speed.
>>    There is my version (It still use OOP, but I have try it wit
>>    Printer and Counter to be structs and it has no effect at
>>    all) [2]https://paste.ofcode.org/38vKWLS8DHRazpv6MTidRJY
>>    The only difference in speed in the end is caused by hash
>>    implementation of dlang associative arrays and rust HashMap,
>>    actually if you modify rust to not used ahash it has almost 
>> same
>>    speed as D
> [...]
>
> I'm confused by the chained hashing of the digits. Why is that 
> necessary?  I would have thought it'd be faster to hash the 
> entire key instead of computing the hash of each digit and 
> chaining them together.

Hash which key? The problem of the naive hashtable-based 
algorithm is that we don't know the length of the key. So all 
lengths are tried starting from 1 and up to the remaining part of 
the phone number. An additional optimization would be to limit 
this search and terminate it early. For example, stop after we 
exceed the length of the longest word from the dictionary. Or 
stop the search upon encountering certain "leaf" words, but this 
effectively morphs the algorithm into something that partially 
resembles Trie. And further evolving it we may end up with a 
full-fledged Trie.

The only practical value of this algorithm is that it's easy to 
implement and has low LOC count. My initial non-optimized naive 
version was also similar 
https://gist.github.com/ssvb/abe821b3cdba7fcb7f3c3cecde864153 (66 
lines including blank lines and some comments). This is somewhat 
comparable to the Lisp results 
https://flownet.com/ron/papers/lisp-java/raw-results.html or to 
the Peter Norvig's solution http://www.norvig.com/java-lisp.html

The purpose of the initial implementation "prototype" is just to 
have some sort of a starting point. And if the runtime 
performance doesn't matter, then we are already done. But if the 
runtime performance does matter, then a lot of things can be 
rapidly improved by spending a bit of time on it. Eventually one 
runs out of ideas and further optimization efforts yield only 
diminishing returns. That's when it's a good time to clean up the 
code, add comments, etc. and label it as the "final" version.

I think that a good study, that intends to compare different 
programming languages against each other, could take all these 
factors into consideration: time to implement the "prototype", 
runtime performance of the prototype, time to implement the 
"final" version, runtime performance of the final version, the 
number of lines of code and its readability/maintainability.