SIMD support...

Fri Jan 6 04:47:09 PST 2012

On 6 January 2012 10:43, Walter Bright <newshound2 at digitalmars.com> wrote:

> On 1/5/2012 5:42 PM, Manu wrote:
>
>> So I've been hassling about this for a while now, and Walter asked me to
>> pitch
>> an email detailing a minimal implementation with some initial thoughts.
>>
>
> Takeaways:
>
> 1. SIMD behavior is going to be very machine specific.
>
> 2. Even trying to do something with + is fraught with peril, as integer
> adds with SIMD can be saturated or unsaturated.
>
> 3. Trying to build all the details about how each of the various adds and
> other ops work into the compiler/optimizer is a large undertaking. D would
> have to support internally maybe a 100 or more new operators.
>
> So some simplification is in order, perhaps a low level layer that is
> fairly extensible for new instructions, and for which a library can be
> layered over for a more presentable interface. A half-formed idea of mine
> is, taking a cue from yours:
>
> Declare one new basic type:
>
>    __v128
>
> which represents the 16 byte aligned 128 bit vector type. The only
> operations defined to work on it would be construction and assignment. The
> __ prefix signals that it is non-portable.
>
> Then, have:
>
>   import core.simd;
>
> which provides two functions:
>
>   __v128 simdop(operator, __v128 op1);
>   __v128 simdop(operator, __v128 op1, __v128 op2);
>
> This will be a function built in to the compiler, at least for the x86.
> (Other architectures can provide an implementation of it that simulates its
> operation, but I doubt that it would be worth anyone's while to use that.)
>
> The operators would be an enum listing of the SIMD opcodes,
>
>    PFACC, PFADD, PFCMPEQ, etc.
>
> For:
>
>    z = simdop(PFADD, x, y);
>
> the compiler would generate:
>
>    MOV z,x
>    PFADD z,y
>
> The code generator knows enough about these instructions to do register
> assignments reasonably optimally.
>
> What do you think? It ain't beeyoootiful, but it's implementable in a
> reasonable amount of time, and it should make writing tight & fast SIMD
> code without having to do it all in assembler.
>
> One caveat is it is typeless; a __v128 could be used as 4 packed ints or 2
> packed doubles. One problem with making it typed is it'll add 10 more types
> to the base compiler, instead of one. Maybe we should just bite the bullet
> and do the types:
>
>    __vdouble2
>    __vfloat4
>    __vlong2
>    __vulong2
>    __vint4
>    __vuint4
>    __vshort8
>    __vushort8
>    __vbyte16
>    __vubyte16
>

Sounds good to me. Though I think __v128 should definitely be typeless,
allowing all those other types to be implemented in libraries. Why wouldn't
you leave that volume of work to libraries?
All those types and related complications shouldn't be code in the
language. There's a reason microsoft chose to only expose __m128 as an
intrinsic. The rest you build yourself.
Also, the LIBRARIES for types vectors can(/will) attempt to support
multiple architectures using version()s behind the scenes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20120106/92be9154/attachment.html>