Standard way to supply hints to branches
Manu
turkeyman at gmail.com
Wed Sep 11 22:23:52 UTC 2024
On Wed, 11 Sept 2024 at 18:26, Walter Bright via Digitalmars-d <
digitalmars-d at puremagic.com> wrote:
> On 9/11/2024 4:44 AM, Manu wrote:
> > Okay, I see. You're depending on the optimiser to specifically collapse
> the goto
> > into the branch as a simplification.
>
> Actually, the same code is generated without optimization. All it's doing
> is
> removing blocks that consist of nothing but "goto". It's a trivial
> optimization,
> and was there in the earliest version of the compiler.
>
>
> > Surely that's not even remotely reliable. There are several ways to
> optimise
> > that function, and I see no reason an optimiser would reliably choose a
> > construct like you show.
>
> gcc -O does more or less the same thing.
>
>
> > I'm actually a little surprised; a lifetime of experience with this sort
> of
> > thing might have lead me to predict that the optimiser would /actually/
> shift
> > the `return 0` up into the place of the goto, effectively eliminating
> the
> > goto... I'm sure I've seen optimisers do that transformation before, but
> I can't
> > recall ever noting an instance of code generation that looks like what
> you
> > pasted... I reckon I might have spotted that before.
>
> The goto remains in the gcc -O version.
>
>
> > ... and turns out, I'm right. I was so surprised with the codegen you
> present
> > that I pulled out compiler explorer and ran some experiments.
> > I tested GCC and Clang for x86, MIPS, and PPC, all of which I am
> extremely
> > familiar with, and all of them optimise the way I predicted. None of
> them showed
> > a pattern like you presented here.
>
> gcc -O produced:
>
> ```
> foo:
> mov EAX,0
> test EDI,EDI
> jne L1B
> sub RSP,8
> call bar at PC32
> mov EAX,1
> add RSP,8
> L1B: rep
> ret
> baz:
> mov EAX,0
> test EDI,EDI
> jne L38
> sub RSP,8
> call bar at PC32
> mov EAX,1
> add RSP,8
> L38: rep
> ret
> ```
>
> > If I had to guess; I would actually imagine that GCC and Clang will very
> > deliberately NOT make a transformation like the one you show, for the
> precise
> > reason that such a transformation changes the nature of static branch
> prediction
> > which someone might have written code to rely on. It would be dangerous
> for the
> > optimiser to transform the code in the way you show, and so it doesn't.
>
> The transformation is (intermediate code):
> ```
> if (i) goto L2; else goto L4;
> L2:
> goto L3;
> L4:
> bar();
> return 1;
> L3:
> return 0;
> ```
> becomes:
> ```
> if (!i) goto L3; else goto L4;
> L4:
> bar();
> return 1;
> L3:
> return 0;
> ```
> I.e. the goto->goto was replaced with a single goto.
>
> It's not dangerous or weird at all, nor does it interfere with branch
> prediction.
>
It inverts the condition. In the case on trial, that inverts the branch
prediction.
But that aside, I'm even more confused; I couldn't reproduce that in any of
my tests.
Here's a bunch of my test copiles... they all turn out the same:
gcc:
baz(int):
test edi, edi
je .L10
xor eax, eax
ret
.L10:
sub rsp, 8
call bar()
mov eax, 1
add rsp, 8
ret
clang:
baz(int):
xor eax, eax
test edi, edi
je .LBB0_1
ret
.LBB0_1:
push rax
call bar()@PLT
mov eax, 1
add rsp, 8
ret
gcc-powerpc:
baz(int):
cmpwi 0,3,0
beq- 0,.L9
li 3,0
blr
.L9:
stwu 1,-16(1)
mflr 0
stw 0,20(1)
bl bar()
lwz 0,20(1)
li 3,1
addi 1,1,16
mtlr 0
blr
arm64:
baz(int):
cbz w0, .L9
mov w0, 0
ret
.L9:
stp x29, x30, [sp, -16]!
mov x29, sp
bl bar()
mov w0, 1
ldp x29, x30, [sp], 16
ret
clang-mips:
baz(int):
beqz $4, $BB0_2
addiu $2, $zero, 0
jr $ra
nop
$BB0_2:
addiu $sp, $sp, -24
sw $ra, 20($sp)
sw $fp, 16($sp)
move $fp, $sp
jal bar()
nop
addiu $2, $zero, 1
move $sp, $fp
lw $fp, 16($sp)
lw $ra, 20($sp)
jr $ra
addiu $sp, $sp, 24
Even if you can manage to convince a compiler to write the output you're
alleging, I would never imagine for a second that's a reliable strategy.
The optimiser could do all kinds of things... even though in all my
experiments, it does exactly what I predicted it would.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20240911/db7a05be/attachment-0001.htm>
More information about the Digitalmars-d
mailing list