Performance of tables slower than built in?
Timon Gehr
timon.gehr at gmx.ch
Thu May 23 15:20:22 UTC 2019
On 23.05.19 12:21, Alex wrote:
> On Wednesday, 22 May 2019 at 00:55:37 UTC, Adam D. Ruppe wrote:
>> On Wednesday, 22 May 2019 at 00:22:09 UTC, JS wrote:
>>> I am trying to create some fast sin, sinc, and exponential routines
>>> to speed up some code by using tables... but it seems it's slower
>>> than the function itself?!?
>>
>> There's intrinsic cpu instructions for some of those that can do the
>> math faster than waiting on memory access.
>>
>> It is quite likely calculating it is actually faster. Even carefully
>> written and optimized tables tend to just have a very small win
>> relative to the cpu nowadays.
>
> Surely not? I'm not sure what method is used to calculate them and maybe
> a table method is used internally for the common functions(maybe the
> periodic ones) but memory access surely is faster than multiplying doubles?
> ...
Depends on what kind of memory access, and what kind of faster. If you
hit L1 cache then a memory access might be (barely) faster than a single
double multiplication. (But modern hardware usually can do multiple
double multiplies in parallel, and presumably also multiple memory
reads, using SIMD and/or instruction-level parallelism.)
I think a single in-register double multiplication will be roughly 25
times faster than an access to main memory. Each access to main memory
will pull an entire cache line from main memory to the cache, so if you
have good locality (you usually won't with a LUT), your memory accesses
will be faster on average. There are a lot of other microarchitectural
details that can matter quite a lot for performance.
> And most of the time these functions are computed by some series that
> requires many terms. I'd expect, say, to compute sin one would require
> at least 10 multiplies for any accuracy... and surely that is much
> slower than simply accessing a table(it's true that my code is more
> complex due to the modulos and maybe that is eating up the diff).
>
> Do you have any proof of your claims? Like a paper that discusses such
> things so I can see what's really going on and how they achieve such
> performance(and how accurate)?
Not exactly what you asked, but this might help:
https://www.agner.org/optimize
Also, look up the CORDIC algorithm.
More information about the Digitalmars-d-learn
mailing list