Help optimize D solution to phone encoding problem: extremely slow performace.

Tue Jan 16 18:54:56 UTC 2024

On Tuesday, 16 January 2024 at 16:56:04 UTC, Siarhei Siamashka 
wrote:
> On Tuesday, 16 January 2024 at 15:50:35 UTC, H. S. Teoh wrote:
>> Unfortunately there seems to be some discrepancy between the 
>> output I got and the prescribed output in your repository. For 
>> example, in your output the number 1556/0 does not have an 
>> encoding, but isn't "1 Mai 0" a valid encoding according to 
>> your dictionary and the original problem description?
>
> You are not allowed to emit "1" as the first token in the 
> output as long as there are any dictionary word matches at that 
> position. The relevant paragraph from the problem statement:
>
> ==snip==
> Encodings of phone numbers can consist of a single word or of 
> multiple
> words separated by spaces. The encodings are built word by word 
> from
> left to right. If and only if at a particular point no word at 
> all from
> the dictionary can be inserted, a single digit from the phone 
> number can
> be copied to the encoding instead. Two subsequent digits are 
> never
> allowed, though. To put it differently: In a partial encoding 
> that
> currently covers k digits, digit k+1 is encoded by itself if 
> and only if,
> first, digit k was not encoded by a digit and, second, there is 
> no word
> in the dictionary that can be used in the encoding starting at 
> digit k+1.
> ==snip==
>
> I also spent a bit of time trying to figure out this nuance 
> when implementing my solution. It doesn't make much sense 
> visually (no back-to-back digits in the output either way), but 
> that's how it is.

Exactly, this is one of the things that make this problem a bit 
annoying to solve :)

@"H. S. Teoh" you implemented the solution as a Trie!! Nice, 
that's also what I did when I "participated" in the study. Here's 
[my Trie solution in 
Java](https://github.com/renatoathaydes/prechelt-phone-number-encoding/blob/fastest-implementations-print-or-count/src/java/Main.java).

These are basically the two common approaches to the problem: a 
Trie or a numeric-based table. According to the study, people who 
use scripting languages almost always go with the numeric 
approach, while people coming from lower level languages tend to 
use a data structure like Trie (if they don't know Trie, they 
come up with something similar which is fascinating), which is 
harder to implement but more efficient in general.

Can I ask you why didn't you use the [D stdlib 
Trie](https://dlang.org/phobos/std_uni.html#codepointTrie)? Not 
sure that would've worked, but did you consider that?