primitive vector types

Mon Feb 23 01:02:44 PST 2009

On Mon, Feb 23, 2009 at 5:18 PM, Don <nospam at nospam.com> wrote:
> Mattias Holm wrote:
>>
>> On 2009-02-21 17:03:06 +0100, Don <nospam at nospam.com> said:
>>>
>>> I don't think that's messy at all. I can't see much difference between
>>> special support for float[4] versus float4. It's better if the code can take
>>> advantage of hardware without specific support. Bear in mind that SSE/SSE2
>>> is a temporary situation. AVX provides for much longer arrays of vectors;
>>> and it's extensible. You'd end up needing to keep adding on special types
>>> whenever a new CPU comes out.
>>>
>>> Note that the fundamental concept which is missing from the C virtual
>>> machine is that all modern machines can efficiently perform operations on
>>> arrays of built-in types of length 2^n, for some small value of n.
>>> We need to get this into the language abstraction. Not follow C++ in
>>> hacking a few extra special types onto the old, deficient C model. And I
>>> think D is actually in a position to do this.
>>>
>>> float[4] would be a greatly superior option if it could be done.
>>> The key requirements are:
>>> (1) need to specify that static arrays are passed by value.
>>> (2) need to keep stack aligned to 16.
>>> The good news is that both of these appear to be done on DMD2-Mac!
>>
>> Yes, float[4] would be ok, if some CPU independent permutation support can
>> be added. Would this be with some intrinsic then or what? I very much like
>> the OpenCL syntax for permutation, but I suppose that an intrinsic such as
>> "float[4] noref permute(float[4] noref vec, int newPos0, int newPos1, int
>> newPos2, int newPos3)" would work as well. Note that this should also work
>> with double[2], byte[16], short[8] and int[4].
>
> Note that if you had static arrays with value semantics, with proper
> alignment, then you could simply create
>
> module std.swizzle;
> float[4] permute(float[4] vec, int newPos0, int newPos1, int newPos2, int
> newPos3);  /* intrinsic */
>
> float[4] wzyx(float[4] q) { return permute(q, 4, 3, 2, 1); }
> float[4] xywz(float[4] q) { return permute(q, 1, 2, 4, 3); }
> // etc
>
> ---
> and your code would be:
>
> import std.swizzle;
>
> void main()
> {
>   float[4] t;
>   auto u = t.wzyx;
> }
>
> I don't think this is terribly difficult once the value semantics are in
> place.
> (Note that once you get beyond 4 members, the .xyzw syntax gives an
> explosion of functions; but I think it's workable at 4; 4! is only 24.
> Beyond that point, you'd probably require direct permute calls).

Actually its 4^4 if you do it like OpenCL/GLSL/HLSL/Cg and allow
repeats like .xxyy.

--bb

--bb