printf() metaprogramming challenge

Sat May 25 11:27:43 UTC 2019

On Saturday, 25 May 2019 at 07:26:47 UTC, Ola Fosheim Grøstad 
wrote:
> On Friday, 24 May 2019 at 23:55:13 UTC, Jonathan Marler wrote:
>> Ulf's algorithm can be implemented in only a few hundred lines 
>> and apparently is the fastest implementation to-date that 
>> maintains a 100% robust algorithm.

>
> It is quite interesting that you get that performance without 
> bloat.

L1 instruction cache are small and the cost of code bloat is only 
rarely counted. Benchmarks are overwhelmingly good mannered 
concerning instruction caches.
This makes that optimisation for instruction cache are neglected.

I had once on our project a heavily optimised function with a lot 
of subcases, loop unrolling etc. In the test benchmark it was the 
fastest to all alternatives. When using in the final application, 
the simple 2 line loop in pure C, outrun it in the concrete 
application. With valgrind cachegrind I discovered that the 
misses in instruction cache made a big, big, difference.

>
> I wonder if it is faster than the special cased float 
> implementations. (using an estimator that chooses a faster 
> floating point version where it works).
>
>> But's it's very new, only a year old I think.  Cool innovation.
>
>