How to track down a bad llvm optimization pass

Thu Jun 30 11:10:04 PDT 2016

On 30 Jun 2016, at 16:40, Joakim via digitalmars-d-ldc wrote:
> I assumed that undef was some kind of poison value

undef is indeed "some kind of poison value", in that each use of it 
evaluates to a (potentially different) arbitrary bit string. By itself, 
using an undef isn't undefined behaviour, but of course for many 
operations it ultimately is, because there are bit string inputs for 
which these operations are undefined (e.g. loads, stores).

LLVM knows a concept called "poison values" too, which are undefs with 
slightly stronger semantics produced by C-style signed integer 
arithmetic overflow and similar operations – in loose terms, any 
operation that depends on them in an externally visible way has 
undefined behaviour.

I usually find the LLVM language reference 
(http://llvm.org/docs/LangRef.html) to be quite a clear resource for 
these sorts of questions.

> [should] the inlining pass […] just be returning undef […]? Since 
> this is at compile-time, I don't think it should. […]
> Are we supposed to be running sanitizers or something else to avoid 
> these bugs?

First off, as it currently stands, this is certainly not an issue in 
LLVM. The lshr instruction is documented as resulting in undefined 
behaviour when used with an out-of-range shift. Replacing the whole call 
with `undef` is thus a valid IR transformation.

So far for LLVM working as designed. The question of course becomes 
whether, being a compiler writer's tool, it would be nice for it to emit 
a warning on such transformations. And here things suddenly become 
muddy. Yes, in this case, getting a warning would be useful. However, if 
the code was not actually reachable dynamically, a warning would be 
wrong. Of course, this can be solved by offering a way to declare basic 
blocks/functions to be considered reachable for that purpose, but that 
introduces extra complexity – I wouldn't be surprised if the fact that 
you'd need to design something along these lines was the main reason why 
LLVM does not try to report such conditions.

Of course, language frontends can always emit dynamical checks to avoid 
executing llvm::Instructions with UB-inducing arguments, whether in the 
form of sanitisers, or by default as part of faithfully lowering their 
semantics.

  — David