Help optimize D solution to phone encoding problem: extremely slow performace.

Sat Jan 20 20:17:14 UTC 2024

On Saturday, 20 January 2024 at 19:45:19 UTC, Daniel Kozak wrote:
> On Sat, Jan 20, 2024 at 2:11 PM Renato via Digitalmars-d-learn 
> < digitalmars-d-learn at puremagic.com> wrote:
>
>> Wow, fantastic feedback from lots of people, thanks so much! 
>> ...
>>
>> > evilrat:
>> > There is another important difference, i quickly looked up D
>> > associative array implementation and the search looks like
>> > nlog(n) complexity with plain loop iteration of hashes, 
>> > whereas
>> > rust's implementation is using "swisstable" algorithm
>> > implementation that has packed SIMD optimized lookups, this 
>> > is
>> > likely where the 3x speed difference comes from.
>>
>> I am not using the default hash implementation in Rust (which 
>> is hashbrown as you've found out). That's too slow (because 
>> it's ddos secure - still Java's hash also is and it's still 
>> faster). I had to replace that with a library called `ahash`.
>>
>> If you're interested in knowing more about that, please [read 
>> my blog 
>> post](https://renato.athaydes.com/posts/how-to-write-fast-rust-code) about optimising the Rust code.
>>
>> So, no, that's not where the difference is coming from... in 
>> fact, I am using a faster hashing function in D.
>>
>> You can [see my custom hashing function
>> here](
>> https://github.com/renatoathaydes/prechelt-phone-number-encoding/commit/d1fa1b77ad928f7d536728fba11f2d69b4afe7e3).
>> This function is not good for the general purpose case, but 
>> it's probably
>> as good as it gets for this problem (see my int128 trick in a 
>> previous post
>> on this thread to understand why). I did try a few variations 
>> of hashing
>> before settling on this one... Rust, if anything, is at a 
>> disadvantage here.
>>
>
> And here you are wrong, it is the hashing when the slowdown 
> comes. It is not only about the speed of hashing function. It 
> is about the quality of hashing functiuon for this particular 
> problem and it is about how hashing table (associative array) 
> is implemented.

I've explained why this function is almost a perfect hash 
function for this problem (there will be almost no collisions 
except for very large inputs where the shift-left will 
"overflow", and even then the probability of collisions should be 
very low). If you're going to claim I'm wrong, you must show 
exactly where I'm wrong, and preferrably you should provide a 
faster implementation. I will even accept a theoretical 
explanation if you can give one. What I will not accept, is for 
you to call me "wrong" just because you say so. That's childish 
behaviour and uncalled for on a technical discussion.