LDC with Profile-Guided Optimization (PGO)

Tue Dec 15 15:05:38 PST 2015

Hi all,
   I have been working on adding profile-guided optimization (PGO) 
to LDC [1][2][3].
At this point, I'd like to hear your input and hope you can help 
with testing!

Unfortunately, to try it out, you will need to build LDC with 
LLVM3.7 yourself. PGO should work on OS X, Linux, and Windows.

A first implementation is mostly complete now: it can generate an 
executable that will output profile data, and it can use profile 
data during a second compilation pass (and it will tell LLVM 
about branch frequencies). LDC does not do any PGO optimizations 
(yet): LLVM should do that.

It works like PGO with Clang, with the fprofile-instr-generate 
and fprofile-instr-use cmdline options [4]:
> ldc2 -fprofile-instr-generate=test.profraw -run test.d
> llvm-profdata merge test.profraw -output test.profdata
> ldc2 -profile-instr-use=test.profdata test.d -of=test
You should now have the executable "test" with an amazing 
performance boost ;-)

You can inspect the generated code using LDC's -output-ll switch. 
Functions should be annotated with call frequencies, and most 
branches should be annotated with branch_weights metadata. For 
example:
> define void @for_loop() #0 !prof !12
> ...
> !12 = !{!"function_entry_count", i64 234}
for "void for_loop()" that is called 234 times, and
> br i1 %3, label %if, label %else, !prof !17
> ...
> !17 = !{!"branch_weights", i32 5, i32 3}
for "if (condition) {...} else {...}"
The branch_weights have an offset of 1, so the above means that 
the condition was true 4 times, and false 2 times. If a certain 
piece of code is never executed, no metadata is added (i.e. you 
won't see {!"branch_weights", i32 1, i32 1}). Some branches are 
intentionally not instrumented/annotated if they lead to 
terminating code (e.g. array boundschecks and auto-generated 
nullptr checks on this at class method entry).

I hope you will be able to test and comment on the work. I am 
very interested in hearing about performance 
gains(/losses/no-change) for your programs. I am curious to learn 
for what kinds of code it makes a difference in practice.

Thanks!
   Johan

(future work will probably include coverage analysis (llvm-cov) 
and support for sampling-based profiles, which should fit 
naturally with the current implementation)

[1] http://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
[2] https://github.com/JohanEngelen/ldc/tree/pgo  (warning: I 
will rebase soon)
[3] https://github.com/ldc-developers/ldc/pull/1219
[4] 
http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation