<div dir="ltr"><div dir="ltr">On Wed, 11 Sept 2024 at 18:26, Walter Bright via Digitalmars-d <<a href="mailto:digitalmars-d@puremagic.com">digitalmars-d@puremagic.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 9/11/2024 4:44 AM, Manu wrote:<br>
> Okay, I see. You're depending on the optimiser to specifically collapse the goto <br>
> into the branch as a simplification.<br>
<br>
Actually, the same code is generated without optimization. All it's doing is <br>
removing blocks that consist of nothing but "goto". It's a trivial optimization, <br>
and was there in the earliest version of the compiler.<br>
<br>
<br>
> Surely that's not even remotely reliable. There are several ways to optimise <br>
> that function, and I see no reason an optimiser would reliably choose a <br>
> construct like you show.<br>
<br>
gcc -O does more or less the same thing.<br>
<br>
<br>
> I'm actually a little surprised; a lifetime of experience with this sort of <br>
> thing might have lead me to predict that the optimiser would /actually/ shift <br>
> the `return 0` up into the place of the goto, effectively eliminating the <br>
> goto... I'm sure I've seen optimisers do that transformation before, but I can't <br>
> recall ever noting an instance of code generation that looks like what you <br>
> pasted... I reckon I might have spotted that before.<br>
<br>
The goto remains in the gcc -O version.<br>
<br>
<br>
> ... and turns out, I'm right. I was so surprised with the codegen you present <br>
> that I pulled out compiler explorer and ran some experiments.<br>
> I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely <br>
> familiar with, and all of them optimise the way I predicted. None of them showed <br>
> a pattern like you presented here.<br>
<br>
gcc -O produced:<br>
<br>
```<br>
foo:<br>
mov EAX,0<br>
test EDI,EDI<br>
jne L1B<br>
sub RSP,8<br>
call bar@PC32<br>
mov EAX,1<br>
add RSP,8<br>
L1B: rep<br>
ret<br>
baz:<br>
mov EAX,0<br>
test EDI,EDI<br>
jne L38<br>
sub RSP,8<br>
call bar@PC32<br>
mov EAX,1<br>
add RSP,8<br>
L38: rep<br>
ret<br>
```<br>
<br>
> If I had to guess; I would actually imagine that GCC and Clang will very <br>
> deliberately NOT make a transformation like the one you show, for the precise <br>
> reason that such a transformation changes the nature of static branch prediction <br>
> which someone might have written code to rely on. It would be dangerous for the <br>
> optimiser to transform the code in the way you show, and so it doesn't.<br>
<br>
The transformation is (intermediate code):<br>
```<br>
if (i) goto L2; else goto L4;<br>
L2:<br>
goto L3;<br>
L4:<br>
bar();<br>
return 1;<br>
L3:<br>
return 0;<br>
```<br>
becomes:<br>
```<br>
if (!i) goto L3; else goto L4;<br>
L4:<br>
bar();<br>
return 1;<br>
L3:<br>
return 0;<br>
```<br>
I.e. the goto->goto was replaced with a single goto.<br>
<br>
It's not dangerous or weird at all, nor does it interfere with branch prediction.<br></blockquote><div><br></div><div>It inverts the condition. In the case on trial, that inverts the branch prediction.</div><div> </div><div>But that aside, I'm even more confused; I couldn't reproduce that in any of my tests.<br></div><div>Here's a bunch of my test copiles... they all turn out the same:</div><div><br></div><div>gcc:</div><div><br></div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:Consolas,"Liberation Mono",Courier,monospace,Consolas,"Courier New",monospace;font-size:14px;line-height:19px;white-space:pre"><div><span style="color:rgb(0,128,128)">baz(int):</span></div><div> <span style="color:rgb(0,0,255)">test</span> <span style="color:rgb(72,100,170)">edi</span>, <span style="color:rgb(72,100,170)">edi</span></div><div> <span style="color:rgb(0,0,255)">je</span> <span style="color:rgb(0,128,128)">.L10</span></div><div> <span style="color:rgb(0,0,255)">xor</span> <span style="color:rgb(72,100,170)">eax</span>, <span style="color:rgb(72,100,170)">eax</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div><div><span style="color:rgb(0,128,128)">.L10:</span></div><div> <span style="color:rgb(0,0,255)">sub</span> <span style="color:rgb(72,100,170)">rsp</span>, <span style="color:rgb(9,134,88)">8</span></div><div> <span style="color:rgb(0,0,255)">call</span> <span style="color:rgb(0,128,128)">bar</span>()</div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(72,100,170)">eax</span>, <span style="color:rgb(9,134,88)">1</span></div><div> <span style="color:rgb(0,0,255)">add</span> <span style="color:rgb(72,100,170)">rsp</span>, <span style="color:rgb(9,134,88)">8</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div><div><span style="color:rgb(0,0,255)"><br></span></div></div></div><div>clang:</div><div><br></div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:Consolas,"Liberation Mono",Courier,monospace,Consolas,"Courier New",monospace;font-size:14px;line-height:19px;white-space:pre"><div><span style="color:rgb(0,128,128)">baz(int):</span></div><div> <span style="color:rgb(0,0,255)">xor</span> <span style="color:rgb(72,100,170)">eax</span>, <span style="color:rgb(72,100,170)">eax</span></div><div> <span style="color:rgb(0,0,255)">test</span> <span style="color:rgb(72,100,170)">edi</span>, <span style="color:rgb(72,100,170)">edi</span></div><div> <span style="color:rgb(0,0,255)">je</span> <span style="color:rgb(0,128,128)">.LBB0_1</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div><div><span style="color:rgb(0,128,128)">.LBB0_1:</span></div><div> <span style="color:rgb(0,0,255)">push</span> <span style="color:rgb(72,100,170)">rax</span></div><div> <span style="color:rgb(0,0,255)">call</span> <span style="color:rgb(0,128,128)">bar</span>()<span style="color:rgb(0,128,128)">@PLT</span></div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(72,100,170)">eax</span>, <span style="color:rgb(9,134,88)">1</span></div><div> <span style="color:rgb(0,0,255)">add</span> <span style="color:rgb(72,100,170)">rsp</span>, <span style="color:rgb(9,134,88)">8</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div><div><span style="color:rgb(0,0,255)"><br></span></div></div></div><div>gcc-powerpc:</div><div><br></div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:Consolas,"Liberation Mono",Courier,monospace,Consolas,"Courier New",monospace;font-size:14px;line-height:19px;white-space:pre"><div><span style="color:rgb(0,128,128)">baz(int):</span></div><div> <span style="color:rgb(0,0,255)">cmpwi</span> <span style="color:rgb(9,134,88)">0</span>,<span style="color:rgb(9,134,88)">3</span>,<span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">beq</span>- <span style="color:rgb(9,134,88)">0</span>,<span style="color:rgb(0,128,128)">.L9</span></div><div> <span style="color:rgb(0,0,255)">li</span> <span style="color:rgb(9,134,88)">3</span>,<span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">blr</span></div><div><span style="color:rgb(0,128,128)">.L9:</span></div><div> <span style="color:rgb(0,0,255)">stwu</span> <span style="color:rgb(9,134,88)">1</span>,-<span style="color:rgb(9,134,88)">16</span>(<span style="color:rgb(9,134,88)">1</span>)</div><div> <span style="color:rgb(0,0,255)">mflr</span> <span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">stw</span> <span style="color:rgb(9,134,88)">0</span>,<span style="color:rgb(9,134,88)">20</span>(<span style="color:rgb(9,134,88)">1</span>)</div><div> <span style="color:rgb(0,0,255)">bl</span> <span style="color:rgb(0,128,128)">bar</span>()</div><div> <span style="color:rgb(0,0,255)">lwz</span> <span style="color:rgb(9,134,88)">0</span>,<span style="color:rgb(9,134,88)">20</span>(<span style="color:rgb(9,134,88)">1</span>)</div><div> <span style="color:rgb(0,0,255)">li</span> <span style="color:rgb(9,134,88)">3</span>,<span style="color:rgb(9,134,88)">1</span></div><div> <span style="color:rgb(0,0,255)">addi</span> <span style="color:rgb(9,134,88)">1</span>,<span style="color:rgb(9,134,88)">1</span>,<span style="color:rgb(9,134,88)">16</span></div><div> <span style="color:rgb(0,0,255)">mtlr</span> <span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">blr</span></div><div><span style="color:rgb(0,0,255)"><br></span></div></div></div><div>arm64:</div><div><br></div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:Consolas,"Liberation Mono",Courier,monospace,Consolas,"Courier New",monospace;font-size:14px;line-height:19px;white-space:pre"><div><span style="color:rgb(0,128,128)">baz(int):</span></div><div> <span style="color:rgb(0,0,255)">cbz</span> <span style="color:rgb(0,128,128)">w0</span>, <span style="color:rgb(0,128,128)">.L9</span></div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(0,128,128)">w0</span>, <span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div><div><span style="color:rgb(0,128,128)">.L9:</span></div><div> <span style="color:rgb(0,0,255)">stp</span> <span style="color:rgb(0,128,128)">x29</span>, <span style="color:rgb(0,128,128)">x30</span>, [<span style="color:rgb(72,100,170)">sp</span>, -<span style="color:rgb(9,134,88)">16</span>]!</div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(0,128,128)">x29</span>, <span style="color:rgb(72,100,170)">sp</span></div><div> <span style="color:rgb(0,0,255)">bl</span> <span style="color:rgb(0,128,128)">bar</span>()</div><div> <span style="color:rgb(0,0,255)">mov</span> <span style="color:rgb(0,128,128)">w0</span>, <span style="color:rgb(9,134,88)">1</span></div><div> <span style="color:rgb(0,0,255)">ldp</span> <span style="color:rgb(0,128,128)">x29</span>, <span style="color:rgb(0,128,128)">x30</span>, [<span style="color:rgb(72,100,170)">sp</span>], <span style="color:rgb(9,134,88)">16</span></div><div> <span style="color:rgb(0,0,255)">ret</span></div></div></div><div><br></div><div>clang-mips:</div><div><br></div><div><div style="color:rgb(0,0,0);background-color:rgb(255,255,254);font-family:Consolas,"Liberation Mono",Courier,monospace,Consolas,"Courier New",monospace;font-size:14px;line-height:19px;white-space:pre"><div><span style="color:rgb(0,128,128)">baz(int):</span></div><div> <span style="color:rgb(0,0,255)">beqz</span> <span style="color:rgb(48,48,192)">$4</span>, <span style="color:rgb(48,48,192)">$BB0</span><span style="color:rgb(0,128,128)">_2</span></div><div> <span style="color:rgb(0,0,255)">addiu</span> <span style="color:rgb(48,48,192)">$2</span>, <span style="color:rgb(0,128,128)">$zero</span>, <span style="color:rgb(9,134,88)">0</span></div><div> <span style="color:rgb(0,0,255)">jr</span> <span style="color:rgb(0,128,128)">$ra</span></div><div> <span style="color:rgb(0,0,255)">nop</span></div><div><span style="color:rgb(0,128,128)">$BB0_2:</span></div><div> <span style="color:rgb(0,0,255)">addiu</span> <span style="color:rgb(0,128,128)">$sp</span>, <span style="color:rgb(0,128,128)">$sp</span>, -<span style="color:rgb(9,134,88)">24</span></div><div> <span style="color:rgb(0,0,255)">sw</span> <span style="color:rgb(0,128,128)">$ra</span>, <span style="color:rgb(9,134,88)">20</span>(<span style="color:rgb(0,128,128)">$sp</span>)</div><div> <span style="color:rgb(0,0,255)">sw</span> <span style="color:rgb(48,48,192)">$f</span><span style="color:rgb(0,128,128)">p</span>, <span style="color:rgb(9,134,88)">16</span>(<span style="color:rgb(0,128,128)">$sp</span>)</div><div> <span style="color:rgb(0,0,255)">move</span> <span style="color:rgb(48,48,192)">$f</span><span style="color:rgb(0,128,128)">p</span>, <span style="color:rgb(0,128,128)">$sp</span></div><div> <span style="color:rgb(0,0,255)">jal</span> <span style="color:rgb(0,128,128)">bar</span>()</div><div> <span style="color:rgb(0,0,255)">nop</span></div><div> <span style="color:rgb(0,0,255)">addiu</span> <span style="color:rgb(48,48,192)">$2</span>, <span style="color:rgb(0,128,128)">$zero</span>, <span style="color:rgb(9,134,88)">1</span></div><div> <span style="color:rgb(0,0,255)">move</span> <span style="color:rgb(0,128,128)">$sp</span>, <span style="color:rgb(48,48,192)">$f</span><span style="color:rgb(0,128,128)">p</span></div><div> <span style="color:rgb(0,0,255)">lw</span> <span style="color:rgb(48,48,192)">$f</span><span style="color:rgb(0,128,128)">p</span>, <span style="color:rgb(9,134,88)">16</span>(<span style="color:rgb(0,128,128)">$sp</span>)</div><div> <span style="color:rgb(0,0,255)">lw</span> <span style="color:rgb(0,128,128)">$ra</span>, <span style="color:rgb(9,134,88)">20</span>(<span style="color:rgb(0,128,128)">$sp</span>)</div><div> <span style="color:rgb(0,0,255)">jr</span> <span style="color:rgb(0,128,128)">$ra</span></div><div> <span style="color:rgb(0,0,255)">addiu</span> <span style="color:rgb(0,128,128)">$sp</span>, <span style="color:rgb(0,128,128)">$sp</span>, <span style="color:rgb(9,134,88)">24</span></div><div><span style="color:rgb(9,134,88)"><br></span></div><div>
<span style="color:rgb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:small;white-space:normal;background-color:rgb(255,255,255)">Even if you can manage to convince a compiler to write the output you're alleging, I would never imagine for a second that's a reliable strategy. The optimiser could do all kinds of things... even though in all my experiments, it does exactly what I predicted it would.</span>
</div></div></div></div></div>