<div class="gmail_quote">On 6 January 2012 11:04, Andrew Wiley <span dir="ltr"><<a href="mailto:wiley.andrew.j@gmail.com">wiley.andrew.j@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">On Fri, Jan 6, 2012 at 2:43 AM, Walter Bright<br>

<<a href="mailto:newshound2@digitalmars.com">newshound2@digitalmars.com</a>> wrote:<br>

> On 1/5/2012 5:42 PM, Manu wrote:<br>

>><br>

>> So I've been hassling about this for a while now, and Walter asked me to<br>

>> pitch<br>

>> an email detailing a minimal implementation with some initial thoughts.<br>

><br>

><br>

> Takeaways:<br>

><br>

> 1. SIMD behavior is going to be very machine specific.<br>

><br>

> 2. Even trying to do something with + is fraught with peril, as integer adds<br>

> with SIMD can be saturated or unsaturated.<br>

><br>

> 3. Trying to build all the details about how each of the various adds and<br>

> other ops work into the compiler/optimizer is a large undertaking. D would<br>

> have to support internally maybe a 100 or more new operators.<br>

><br>

> So some simplification is in order, perhaps a low level layer that is fairly<br>

> extensible for new instructions, and for which a library can be layered over<br>

> for a more presentable interface. A half-formed idea of mine is, taking a<br>

> cue from yours:<br>

><br>

> Declare one new basic type:<br>

><br>

>    __v128<br>

><br>

> which represents the 16 byte aligned 128 bit vector type. The only<br>

> operations defined to work on it would be construction and assignment. The<br>

> __ prefix signals that it is non-portable.<br>

><br>

> Then, have:<br>

><br>

>   import core.simd;<br>

><br>

> which provides two functions:<br>

><br>

>   __v128 simdop(operator, __v128 op1);<br>

>   __v128 simdop(operator, __v128 op1, __v128 op2);<br>

><br>

> This will be a function built in to the compiler, at least for the x86.<br>

> (Other architectures can provide an implementation of it that simulates its<br>

> operation, but I doubt that it would be worth anyone's while to use that.)<br>

><br>

> The operators would be an enum listing of the SIMD opcodes,<br>

><br>

>    PFACC, PFADD, PFCMPEQ, etc.<br>

><br>

> For:<br>

><br>

>    z = simdop(PFADD, x, y);<br>

><br>

> the compiler would generate:<br>

><br>

>    MOV z,x<br>

>    PFADD z,y<br>

><br>

<br>

</div></div>Would this tie SIMD support directly to x86/x86_64, or would it<br>

possible to also support NEON on ARM (also 128 bit SIMD, see<br>

<a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0409g/index.html" target="_blank">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0409g/index.html</a><br>

) ?<br>

(Obviously not for DMD, but if the syntax wasn't directly tied to<br>

x86/64, GDC and LDC could support this)<br>

It seems like using a standard naming convention instead of directly<br>

referencing instructions could let the underlying SIMD instructions<br>

vary across platforms, but I don't know enough about the technologies<br>

to say whether NEON's capabilities match SSE closely enough that they<br>

could be handled the same way.<br>

</blockquote></div><br><div>The underlying architectures are too different to try and map opcodes across architectures.</div><div>__v128 should map to each architecutres native SIMD type, allowing for the compiler to express the hardware, but the opcodes would come from architecture specific opcodes available in each compiler.</div>

<div><br></div><div>As I keep suggesting, LIBRARIES would be created to supply the types like float4, int4, etc, which may also use version() liberally behind the scenes to support all architectures, allowing a common and efficient API for all architectures at this level.</div>