Standard way to supply hints to branches

Manu turkeyman at
Wed Sep 11 22:23:52 UTC 2024

On Wed, 11 Sept 2024 at 18:26, Walter Bright via Digitalmars-d <
digitalmars-d at> wrote:

> On 9/11/2024 4:44 AM, Manu wrote:
> > Okay, I see. You're depending on the optimiser to specifically collapse
> the goto
> > into the branch as a simplification.
> Actually, the same code is generated without optimization. All it's doing
> is
> removing blocks that consist of nothing but "goto". It's a trivial
> optimization,
> and was there in the earliest version of the compiler.
> > Surely that's not even remotely reliable. There are several ways to
> optimise
> > that function, and I see no reason an optimiser would reliably choose a
> > construct like you show.
> gcc -O does more or less the same thing.
> > I'm actually a little surprised; a lifetime of experience with this sort
> of
> > thing might have lead me to predict that the optimiser would /actually/
> shift
> > the `return 0` up into the place of the goto, effectively eliminating
> the
> > goto... I'm sure I've seen optimisers do that transformation before, but
> I can't
> > recall ever noting an instance of code generation that looks like what
> you
> > pasted... I reckon I might have spotted that before.
> The goto remains in the gcc -O version.
> > ... and turns out, I'm right. I was so surprised with the codegen you
> present
> > that I pulled out compiler explorer and ran some experiments.
> > I tested GCC and Clang for x86, MIPS, and PPC, all of which I am
> extremely
> > familiar with, and all of them optimise the way I predicted. None of
> them showed
> > a pattern like you presented here.
> gcc -O produced:
> ```
> foo:
>      mov       EAX,0
>      test      EDI,EDI
>      jne       L1B
>      sub       RSP,8
>      call      bar at PC32
>      mov       EAX,1
>      add       RSP,8
> L1B:    rep
>      ret
> baz:
>      mov       EAX,0
>      test      EDI,EDI
>      jne       L38
>      sub       RSP,8
>      call      bar at PC32
>      mov       EAX,1
>      add       RSP,8
> L38:    rep
>      ret
> ```
> > If I had to guess; I would actually imagine that GCC and Clang will very
> > deliberately NOT make a transformation like the one you show, for the
> precise
> > reason that such a transformation changes the nature of static branch
> prediction
> > which someone might have written code to rely on. It would be dangerous
> for the
> > optimiser to transform the code in the way you show, and so it doesn't.
> The transformation is (intermediate code):
> ```
> if (i) goto L2; else goto L4;
> L2:
>     goto L3;
> L4:
>     bar();
>     return 1;
> L3:
>     return 0;
> ```
> becomes:
> ```
> if (!i) goto L3; else goto L4;
> L4:
>      bar();
>      return 1;
> L3:
>      return 0;
> ```
> I.e. the goto->goto was replaced with a single goto.
> It's not dangerous or weird at all, nor does it interfere with branch
> prediction.

It inverts the condition. In the case on trial, that inverts the branch

But that aside, I'm even more confused; I couldn't reproduce that in any of
my tests.
Here's a bunch of my test copiles... they all turn out the same:


        test    edi, edi
        je      .L10
        xor     eax, eax
        sub     rsp, 8
        call    bar()
        mov     eax, 1
        add     rsp, 8


        xor     eax, eax
        test    edi, edi
        je      .LBB0_1
        push    rax
        call    bar()@PLT
        mov     eax, 1
        add     rsp, 8


        cmpwi 0,3,0
        beq- 0,.L9
        li 3,0
        stwu 1,-16(1)
        mflr 0
        stw 0,20(1)
        bl bar()
        lwz 0,20(1)
        li 3,1
        addi 1,1,16
        mtlr 0


        cbz     w0, .L9
        mov     w0, 0
        stp     x29, x30, [sp, -16]!
        mov     x29, sp
        bl      bar()
        mov     w0, 1
        ldp     x29, x30, [sp], 16


        beqz    $4, $BB0_2
        addiu   $2, $zero, 0
        jr      $ra
        addiu   $sp, $sp, -24
        sw      $ra, 20($sp)
        sw      $fp, 16($sp)
        move    $fp, $sp
        jal     bar()
        addiu   $2, $zero, 1
        move    $sp, $fp
        lw      $fp, 16($sp)
        lw      $ra, 20($sp)
        jr      $ra
        addiu   $sp, $sp, 24

Even if you can manage to convince a compiler to write the output you're
alleging, I would never imagine for a second that's a reliable strategy.
The optimiser could do all kinds of things... even though in all my
experiments, it does exactly what I predicted it would.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Digitalmars-d mailing list