"BOLT" post-link optimizer gives 15% speed boost to Clang
Walter Bright
newshound2 at digitalmars.com
Wed Oct 24 01:57:38 UTC 2018
On 10/23/2018 3:07 PM, Johan Engelen wrote:
> Hi all,
> "Post-link optimization", what? Yes, indeed, optimization _after_ the
> _linker_ has generated a _binary_.
> Read about this interesting work and the discussion here:
> https://lists.llvm.org/pipermail/llvm-dev/2018-October/126985.html
>
> When applied to clang, the performance gain is 15% faster execution (note,
> that's the result of applying BOLT on a clang binary that was already built with
> PGO and LTO !)
>
> Cheers,
> Johan
>
Digital Mars C++ had this in the 1990s.
https://www.digitalmars.com/ctg/trace.html
How it works is -gt switch is thrown to generate a runtime profile of the graph
of how functions call each other. Then, a module definition file
https://www.digitalmars.com/ctg/ctgDefFiles.html is generated that directs the
linker to order the layout of functions in the executable to minimize cache misses.
A 15% speedup is about right for this sort of optimization.
Note that Digital Mars didn't invent this, it was invented by (I forgot who) who
productized it as "The Segmentor".
From
https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.md :
"BOLT (Binary Optimization and Layout Tool) is designed to improve the
application performance by laying out code in a manner that helps CPU better
utilize its caching and branch predicting resources.
"Before we can run BOLT optimizations, we need to collect the profile for Clang,
Yup, it's reinvention of the same thing.
More information about the Digitalmars-d
mailing list