printf() metaprogramming challenge
Patrick Schluter
Patrick.Schluter at bbox.fr
Sat May 25 11:27:43 UTC 2019
On Saturday, 25 May 2019 at 07:26:47 UTC, Ola Fosheim Grøstad
wrote:
> On Friday, 24 May 2019 at 23:55:13 UTC, Jonathan Marler wrote:
>> Ulf's algorithm can be implemented in only a few hundred lines
>> and apparently is the fastest implementation to-date that
>> maintains a 100% robust algorithm.
>
> It is quite interesting that you get that performance without
> bloat.
L1 instruction cache are small and the cost of code bloat is only
rarely counted. Benchmarks are overwhelmingly good mannered
concerning instruction caches.
This makes that optimisation for instruction cache are neglected.
I had once on our project a heavily optimised function with a lot
of subcases, loop unrolling etc. In the test benchmark it was the
fastest to all alternatives. When using in the final application,
the simple 2 line loop in pure C, outrun it in the concrete
application. With valgrind cachegrind I discovered that the
misses in instruction cache made a big, big, difference.
>
> I wonder if it is faster than the special cased float
> implementations. (using an estimator that chooses a faster
> floating point version where it works).
>
>> But's it's very new, only a year old I think. Cool innovation.
>
>
More information about the Digitalmars-d
mailing list