<div class="gmail_quote">On 20 June 2012 13:59, Don Clugston <span dir="ltr"><<a href="mailto:dac@nospam.com" target="_blank">dac@nospam.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">You and I seem to be from different planets. I have almost never written as asm function which was suitable for inlining.</div>

<br>

Take a look at std.internal.math.biguintX86.d<br>

<br>

I do not know how to write that code without inline asm.<br>

</blockquote></div><br><div>Interesting.</div><div>I wish I could paste some counter-examples, but they're all proprietary >_<</div><div><br></div><div>I think they key detail here is where you stated, they _always_ include a loop. Is this because it's hard to manipulate the compiler into the correct interaction with the flags register?</div>

<div>I'd be interested to compare the compiled D code, and your hand written asm code, to see where exactly the optimiser goes wrong. It doesn't look like you're exploiting too many tricks (at a brief glance), it's just nice tight hand written code, which the optimiser should theoretically be able to get right...</div>

<div><br></div><div>I find optimisers are very good at code simplification, assuming that you massage the code/expressions to neatly match any architectural quirks.</div><div>I also appreciate that good x86 code is possibly the hardest architecture for an optimiser to get right...</div>