LDC with Profile-Guided Optimization (PGO)
Johan Engelen via Digitalmars-d
digitalmars-d at puremagic.com
Tue Dec 15 15:05:38 PST 2015
Hi all,
I have been working on adding profile-guided optimization (PGO)
to LDC [1][2][3].
At this point, I'd like to hear your input and hope you can help
with testing!
Unfortunately, to try it out, you will need to build LDC with
LLVM3.7 yourself. PGO should work on OS X, Linux, and Windows.
A first implementation is mostly complete now: it can generate an
executable that will output profile data, and it can use profile
data during a second compilation pass (and it will tell LLVM
about branch frequencies). LDC does not do any PGO optimizations
(yet): LLVM should do that.
It works like PGO with Clang, with the fprofile-instr-generate
and fprofile-instr-use cmdline options [4]:
> ldc2 -fprofile-instr-generate=test.profraw -run test.d
> llvm-profdata merge test.profraw -output test.profdata
> ldc2 -profile-instr-use=test.profdata test.d -of=test
You should now have the executable "test" with an amazing
performance boost ;-)
You can inspect the generated code using LDC's -output-ll switch.
Functions should be annotated with call frequencies, and most
branches should be annotated with branch_weights metadata. For
example:
> define void @for_loop() #0 !prof !12
> ...
> !12 = !{!"function_entry_count", i64 234}
for "void for_loop()" that is called 234 times, and
> br i1 %3, label %if, label %else, !prof !17
> ...
> !17 = !{!"branch_weights", i32 5, i32 3}
for "if (condition) {...} else {...}"
The branch_weights have an offset of 1, so the above means that
the condition was true 4 times, and false 2 times. If a certain
piece of code is never executed, no metadata is added (i.e. you
won't see {!"branch_weights", i32 1, i32 1}). Some branches are
intentionally not instrumented/annotated if they lead to
terminating code (e.g. array boundschecks and auto-generated
nullptr checks on this at class method entry).
I hope you will be able to test and comment on the work. I am
very interested in hearing about performance
gains(/losses/no-change) for your programs. I am curious to learn
for what kinds of code it makes a difference in practice.
Thanks!
Johan
(future work will probably include coverage analysis (llvm-cov)
and support for sampling-based profiles, which should fit
naturally with the current implementation)
[1] http://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
[2] https://github.com/JohanEngelen/ldc/tree/pgo (warning: I
will rebase soon)
[3] https://github.com/ldc-developers/ldc/pull/1219
[4]
http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
More information about the Digitalmars-d
mailing list