<div class="gmail_quote">On 6 January 2012 22:40, Martin Nowak <span dir="ltr"><<a href="mailto:dawg@dawgfoto.de">dawg@dawgfoto.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Fri, 06 Jan 2012 20:00:15 +0100, Manu <<a href="mailto:turkeyman@gmail.com" target="_blank">turkeyman@gmail.com</a>> wrote:<br>

<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

On 6 January 2012 20:17, Martin Nowak <<a href="mailto:dawg@dawgfoto.de" target="_blank">dawg@dawgfoto.de</a>> wrote:<br>

<br>

</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">

There is another benefit.<br>

Consider the following:<br>

<br>

__vec128 addps(__vec128 a, __vec128 b) pure<br>

{<br>

   __vec128 res = a;<br>

<br>

   if (__ctfe)<br>

   {<br>

       foreach(i; 0 .. 4)<br>

          res[i] += b[i];<br>

   }<br>

   else<br>

   {<br></div>

       asm (res, b)<div class="im"><br>

       {<br>

           addps res, b;<br>

       }<br>

   }<br>

   return res;<br>

<br>

}<br>

<br>

</div></blockquote><div class="im">

<br>

You don't need to use inline ASM to be able to do this, it will work the<br>

same with intrinsics.<br>

I've detailed numerous problems with using inline asm, and complications<br>

with extending the inline assembler to support this.<br>

<br>

</div></blockquote>

Don't get me wrong here. The idea is to find out if intrinsics<br>

can be build with the help of inlineable asm functions.<br>

The ctfe support is one good reason to go with a library solution.</blockquote><div><br></div><div>/agree, this is a nice argument to support putting it in libraries.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Most compilers can't reschedule code around inline asm blocks. There are a<br>

lot of reasons for this, google can help you.<br>

The main reason is that a COMPILER doesn't attempt to understand the<br>

assembly it's being asked to insert inline. The information that it may use<br>

</blockquote></div>

It doesn't have to understand the assembly.<br>

Wrapping these in functions creates an IR expression with inputs and outputs.<br>

Declaring them as pure gives the compiler free hands to apply whatever<br>

optimizations he does normally on an IR tree.<br>

Common subexpressions elimination, removing dead expressions...</blockquote><div><br></div><div>These functions shouldn't be functions... if they're not all inlined, then the implementation is broken.</div><div>Once you inline all these micro asm blocks; 100 small asm blocks inlined in a single function, you're making a very hard time for the optimiser.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Same problem as above. The compiler would need to understand enough about<br>

assembly to perform optimisation on the assembly its self to clean this up.<br>

Using intrinsics, all the register allocation, load/store code, etc, is all<br>

in the regular realm of compiling the language, and the code generation and<br>

optimisation will all work as usual.<br>

<br>

</blockquote></div>

There is no informational difference between the intrinsic<br>

<br>

__m128 _mm_add_ps(__m128 a, __m128 b);<br>

<br>

and an inline assembler version<br></blockquote><div><br></div><div>There is actually. To the compiler, the intrinsic is a normal function, with some hook in the code generator to produce the appropriate opcode when it's performing actual code generation.</div>

<div>On most compilers, the inline asm on the other hand, is unknown to the compiler, the optimiser can't do much anymore, because it doesn't know what the inline asm has done, and the code generator just goes and pastes your asm code inline where you told it to. It doesn't know if you've written to aliased variables, called functions, etc.. it can no longer safely rearrange code around the inline asm block.. which means it's not free to pipeline the code efficiently.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

So the argument here is that intrinsics in D can easier be<br>

mapped to existing intrinsics in GCC?<br>

I do understand that this will be pretty difficult for GDC<br>

to implement.<br>

Reminds me that Walter has stated several times how much<br>

better an internal assembler can integrate with the language.</blockquote><div><br></div><div>Basically yes.</div></div>