How to squeeze more performance out of gcd?

Uknown sireeshkodali1 at gmail.com
Sat Jun 13 14:25:21 UTC 2020


On Friday, 12 June 2020 at 23:27:35 UTC, H. S. Teoh wrote:
> So this week I finished a major feature in one of my projects, 
> which involves adding a custom number type that can do exact 
> arithmetic with quadratic irrationals (numbers of the form 
> (a+b*√r)/c with integer a,b,c, and fixed r). The custom number 
> type, unsurprisingly, performs about 5x worse than using 
> `double`, but with the plus side that arithmetic is exact so I 
> never have to worry about round-off errors.
>
> Profiling a typical use case reveals that the most prominent 
> bottleneck (50+% of total running time) is std.numeric.gcd.  
> Inspecting the disassembly shows that it's using the fast 
> binary GCD algorithm (Stein's algorithm) with the bsf and cmovg 
> instructions (this is with `ldc2 -mcpu=native -O3`), which is 
> as fast as you can get for GCD, AFAIK.
>
> Is there a way to squeeze more juice out of gcd?  Like is there 
> some novel algorithm or hack that could trim that 50% figure a 
> little?
> [...]
> T

I was doing some competitive programming last month and fast-gcd 
came up, this was the fastest I found back then: 
https://lemire.me/blog/2013/12/26/fastest-way-to-compute-the-greatest-common-divisor/

Of course it turned out the actual fast approach to my problem 
was to skip the gcd calculations entirely, because they were 
irrelevant, but that blog post may help you.


More information about the Digitalmars-d mailing list