<div dir="ltr"><div dir="ltr">On Wed, 11 Sept 2024 at 06:21, Walter Bright via Digitalmars-d <<a href="mailto:digitalmars-d@puremagic.com">digitalmars-d@puremagic.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Compile the following with -vasm -O:<br>
<br>
```<br>
void bar();<br>
<br>
int foo(int i)<br>
{<br>
if (i)<br>
return 0;<br>
bar();<br>
return 1;<br>
}<br>
<br>
int baz(int i)<br>
{<br>
if (i)<br>
goto Lreturn0;<br>
bar();<br>
return 1;<br>
<br>
Lreturn0:<br>
return 0;<br>
}<br>
```<br>
and you get:<br>
```<br>
_D5test93fooFiZi:<br>
0000: 55 push RBP<br>
0001: 48 8B EC mov RBP,RSP<br>
0004: 85 FF test EDI,EDI<br>
0006: 74 04 je Lc<br>
0008: 31 C0 xor EAX,EAX // hot path<br>
000a: 5D pop RBP<br>
000b: C3 ret<br>
000c: E8 00 00 00 00 call L0 // cold path<br>
0011: B8 01 00 00 00 mov EAX,1<br>
0016: 5D pop RBP<br>
0017: C3 ret<br>
_D5test93bazFiZi:<br>
0000: 55 push RBP<br>
0001: 48 8B EC mov RBP,RSP<br>
0004: 85 FF test EDI,EDI<br>
0006: 75 0C jne L14<br>
0008: E8 00 00 00 00 call L0 // hot path<br>
000d: B8 01 00 00 00 mov EAX,1<br>
0012: 5D pop RBP<br>
0013: C3 ret<br>
0014: 31 C0 xor EAX,EAX // cold path<br>
0016: 5D pop RBP<br>
0017: C3 ret<br>
```<br></blockquote><div><br></div><div>Okay, I see. You're depending on the optimiser to specifically collapse the goto into the branch as a simplification.<br></div><div>Surely that's not even remotely reliable. There are several ways to optimise that function, and I see no reason an optimiser would reliably choose a construct like you show.</div><div><br></div><div>I'm actually a little surprised; a lifetime of experience with this sort of thing might have lead me to predict that the optimiser would <i>actually</i> shift the `return 0` up into the place of the goto, effectively eliminating the goto... I'm sure I've seen optimisers do that transformation before, but I can't recall ever noting an instance of code generation that looks like what you pasted... I reckon I might have spotted that before.</div><div><br></div><div>... and turns out, I'm right. I was so surprised with the codegen you present that I pulled out compiler explorer and ran some experiments.</div><div>I tested GCC and Clang for x86, MIPS, and PPC, all of which I am extremely familiar with, and all of them optimise the way I predicted. None of them showed a pattern like you presented here.</div><div><br></div><div>If I had to guess; I would actually imagine that GCC and Clang will very deliberately NOT make a transformation like the one you show, for the precise reason that such a transformation changes the nature of static branch prediction which someone might have written code to rely on. It would be dangerous for the optimiser to transform the code in the way you show, and so it doesn't.</div></div></div>