I'm currently working on optimizing the GC marking code and I'm having quite some problems with the inline decisions of the compiler. The compiler can't make good decisions here, because it lacks information about which branches are executed rarely. Would be nice to have @noinline, @forceinline and __builtin_expect.