Performance of tables slower than built in?

Fri May 24 12:01:55 UTC 2019

On Friday, 24 May 2019 at 11:45:46 UTC, Ola Fosheim Grøstad wrote:
> On Friday, 24 May 2019 at 08:33:34 UTC, Ola Fosheim Grøstad 
> wrote:
>> On Thursday, 23 May 2019 at 21:47:45 UTC, Alex wrote:
>>> Either way, sin it's still twice as fast. Also, in the code 
>>> the sinTab version is missing the writeln so it would have 
>>> been faster.. so it is not being optimized out.
>>
>> Well, when I run this modified version:
>>
>> https://gist.github.com/run-dlang/9f29a83b7b6754da98993063029ef93c
>>
>> on https://run.dlang.io/
>>
>> then I get:
>>
>> LUT:    709
>> sin(x): 2761
>>
>> So the LUT is 3-4 times faster even with your quarter-LUT 
>> overhead.
>
> FWIW, as far as I can tell I managed to get the lookup version 
> down to 104 by using bit manipulation tricks like these:
>
> auto fastQuarterLookup(double x){
>     const ulong mantissa = cast(ulong)( (x - floor(x)) * 
> (cast(double)(1UL<<63)*2.0) );
>     const double sign = 
> cast(double)(-cast(uint)((mantissa>>63)&1));
>     … etc
>
> So it seems like a quarter-wave LUT is 27 times faster than sin…
>
> You just have to make sure that the generated instructions 
> fills the entire CPU pipeline.

Well, the QuarterWave was suppose to generate just a quarter 
since that is all that is required for these functions due to 
symmetry and periodicity. I started with a half to get that 
working then figure out the sign flipping.

Essentially one just has to tabulate a quarter of sin, that is, 
from 0 to 90o and then get the sin right. This allows one to have 
4 times the resolution or 1/4 the size at the same cost.

Or, to put it another say, sin as 4 fold redundancy.

I'll check out your code, thanks for looking in to it.