Performance of tables slower than built in?

Thu May 23 15:20:22 UTC 2019

On 23.05.19 12:21, Alex wrote:
> On Wednesday, 22 May 2019 at 00:55:37 UTC, Adam D. Ruppe wrote:
>> On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
>>> I am trying to create some fast sin, sinc, and exponential routines 
>>> to speed up some code by using tables... but it seems it's slower 
>>> than the function itself?!?
>>
>> There's intrinsic cpu instructions for some of those that can do the 
>> math faster than waiting on memory access.
>>
>> It is quite likely calculating it is actually faster. Even carefully 
>> written and optimized tables tend to just have a very small win 
>> relative to the cpu nowadays.
> 
> Surely not? I'm not sure what method is used to calculate them and maybe 
> a table method is used internally for the common functions(maybe the 
> periodic ones) but memory access surely is faster than multiplying doubles?
> ...

Depends on what kind of memory access, and what kind of faster. If you 
hit L1 cache then a memory access might be (barely) faster than a single 
double multiplication. (But modern hardware usually can do multiple 
double multiplies in parallel, and presumably also multiple memory 
reads, using SIMD and/or instruction-level parallelism.)

I think a single in-register double multiplication will be roughly 25 
times faster than an access to main memory. Each access to main memory 
will pull an entire cache line from main memory to the cache, so if you 
have good locality (you usually won't with a LUT), your memory accesses 
will be faster on average. There are a lot of other microarchitectural 
details that can matter quite a lot for performance.

> And most of the time these functions are computed by some series that 
> requires many terms. I'd expect, say, to compute sin one would require 
> at least 10 multiplies for any accuracy... and surely that is much 
> slower than simply accessing a table(it's true that my code is more 
> complex due to the modulos and maybe that is eating up the diff).
> 
> Do you have any proof of your claims? Like a paper that discusses such 
> things so I can see what's really going on and how they achieve such 
> performance(and how accurate)?

Not exactly what you asked, but this might help:
https://www.agner.org/optimize

Also, look up the CORDIC algorithm.