LLVM IR influence on compiler debugging
bearophile
bearophileHUGS at lycos.com
Thu Jun 28 23:04:36 PDT 2012
This is a very easy to read article about the design of LLVM:
http://www.drdobbs.com/architecture-and-design/the-design-of-llvm/240001128
It explains what the IR is:
>The most important aspect of its design is the LLVM Intermediate
>Representation (IR), which is the form it uses to represent code
>in the compiler. LLVM IR [...] is itself defined as a first
>class language with well-defined semantics.<
>In particular, LLVM IR is both well specified and the only
>interface to the optimizer. This property means that all you
>need to know to write a front end for LLVM is what LLVM IR is,
>how it works, and the invariants it expects. Since LLVM IR has a
>first-class textual form, it is both possible and reasonable to
>build a front end that outputs LLVM IR as text, then uses UNIX
>pipes to send it through the optimizer sequence and code
>generator of your choice. It might be surprising, but this is
>actually a pretty novel property to LLVM and one of the major
>reasons for its success in a broad range of different
>applications. Even the widely successful and relatively
>well-architected GCC compiler does not have this property: its
>GIMPLE mid-level representation is not a self-contained
>representation.<
That IR has a great effect on making it simpler to debug the
compiler, I think this is important (and I think it partially
explains why Clang was created so quickly):
>Compilers are very complicated, and quality is important,
>therefore testing is critical. For example, after fixing a bug
>that caused a crash in an optimizer, a regression test should be
>added to make sure it doesn't happen again. The traditional
>approach to testing this is to write a .c file (for example)
>that is run through the compiler, and to have a test harness
>that verifies that the compiler doesn't crash. This is the
>approach used by the GCC test suite, for example. The problem
>with this approach is that the compiler consists of many
>different subsystems and even many different passes in the
>optimizer, all of which have the opportunity to change what the
>input code looks like by the time it gets to the previously
>buggy code in question. If something changes in the front end or
>an earlier optimizer, a test case can easily fail to test what
>it is supposed to be testing. By using the textual form of LLVM
>IR with the modular optimizer, the LLVM test suite has highly
>focused regression tests that can load LLVM IR from disk, run it
>through exactly one optimization pass, and verify the expected
>behavior. Beyond crashing, a more complicated behavioral test
>wants to verify that an optimization is actually performed.
>[...] While this might seem like a really trivial example, this
>is very difficult to test by writing .c files: front ends often
>do constant folding as they parse, so it is very difficult and
>fragile to write code that makes its way downstream to a
>constant folding optimization pass. Because we can load LLVM IR
>as text and send it through the specific optimization pass we're
>interested in, then dump out the result as another text file, it
>is really straightforward to test exactly what we want, both for
>regression and feature tests.<
Bye,
bearophile
More information about the Digitalmars-d
mailing list