"BOLT" post-link optimizer gives 15% speed boost to Clang

Walter Bright newshound2 at digitalmars.com
Wed Oct 24 01:57:38 UTC 2018


On 10/23/2018 3:07 PM, Johan Engelen wrote:
> Hi all,
>    "Post-link optimization", what? Yes, indeed, optimization _after_ the 
> _linker_ has generated a _binary_.
> Read about this interesting work and the discussion here: 
> https://lists.llvm.org/pipermail/llvm-dev/2018-October/126985.html
> 
> When applied to clang, the performance gain is 15% faster execution (note, 
> that's the result of applying BOLT on a clang binary that was already built with 
> PGO and LTO !)
> 
> Cheers,
>    Johan
> 

Digital Mars C++ had this in the 1990s.

https://www.digitalmars.com/ctg/trace.html

How it works is -gt switch is thrown to generate a runtime profile of the graph 
of how functions call each other. Then, a module definition file 
https://www.digitalmars.com/ctg/ctgDefFiles.html is generated that directs the 
linker to order the layout of functions in the executable to minimize cache misses.

A 15% speedup is about right for this sort of optimization.

Note that Digital Mars didn't invent this, it was invented by (I forgot who) who 
productized it as "The Segmentor".

 From 
https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.md :

"BOLT (Binary Optimization and Layout Tool) is designed to improve the 
application performance by laying out code in a manner that helps CPU better 
utilize its caching and branch predicting resources.

"Before we can run BOLT optimizations, we need to collect the profile for Clang,

Yup, it's reinvention of the same thing.


More information about the Digitalmars-d mailing list