std.simd module

Sat Feb 4 21:14:40 PST 2012

On 2/4/2012 7:37 PM, Martin Nowak wrote:
> Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkeyman at gmail.com>:
>
>> On 5 February 2012 03:08, Martin Nowak <dawg at dawgfoto.de> wrote:
>>
>>> Let me restate the main point.
>>> Your approach to a higher level module wraps intrinsics with named
>>> functions.
>>> There is little gain in making simd(AND, f, f2) to and(f, f2) when
>>> you can
>>> easily take this to the level GLSL achieves.
>>>
>>
>> What is missing to reach that level in your opinion? I think I basically
>> offer that (with some more work)
>> It's not clear to me what you object to...
>> I'm not prohibiting the operators, just adding the explicit functions,
>> which may be more efficient in certain cases (they receive the version).
>>
>> Also the 'gains' of wrapping an intrinsic in an almost identical function
>> are, portability, and potential optimisation for hardware versioning. I'm
>> specifically trying to build something that's barely above the intrinsics
>> here, although a lot of the more arcane intrinsics are being collated
>> into
>> their typically useful functionality.
>>
>> Are you just focused on the primitive math ops, or something broader?
>
> GLSL achieves very clear and simple to write construction and conversion
> of values.
>
> I think wrapping the core.simd vector types in an alias this struct
> makes it a snap
> to define conversion through constructors and swizzling through
> properties/opDispatch.
> Then you can overload operands to do the implementation specific stuff
> and add named methods
> for the rest.

The GLSL or HLSL sync is fairly nice, but has a few advantages that are 
harder to take advantage of on PC SIMD:

The hardware that runs HLSL can handle natively operate on data types 
'smaller' than the register, either handled natively or by turning all 
the instructions into a mass of scalar ops that are then run in parallel 
as best as possible.  In SIMD land on CPU's the design is much more 
rigid: we are effectively stuck using float and float4 data types, and 
emulating float2 and float3.    For a very long time there was not even 
a a dot product instruction, as from Intel's point of view your data is 
transposed incorrectly if you needed to do one (plus they have to handle 
dot2, dot3, dot4 etc).

The cost of this emulation of float2 and float3 types is that we have to 
put 'some data' in the unused slots of the SIMD register on swizzle 
operations, which will usually lead to the SIMD instructions generating 
INF's and NANs in that slot and hurting performance.

The other major problem with the shader swizzle syntax is that it 
'doesnt scale'.  If you are using a 128 register holding 8 shorts or 16 
bytes, what are the letters here?  Shaders assume 4 is the limit so you 
have either xyzw and rgba.  Then there are platform considerations (i.e. 
you can can't swizzle 8 bit data on SSE, you have to use a series of 
pack|unpack and shuffles, but VMX can easily)

That said: shader swizzle syntax is very nice, it can certainly reduce 
the amount of code you write by a huge factor (though the codegen is 
another matter)  Even silly tricks with swizzling literals in HLSL are 
useful like the following code to sum up some numbers:

if (dot(a, 1.f.xxx) > 0)