seeding the pot for 2.0 features [small vectors]
Chad J
gamerChad at _spamIsBad_gmail.com
Mon Jan 29 16:49:07 PST 2007
Bill Baxter wrote:
> Mikola Lysenko wrote:
>
>> Bill Baxter wrote:
>>
>>> "Most CPUs today have *some* kind of SSE/Altivec type thing"
>>>
>>> That may be, but I've heard that at least SSE is really not that
>>> suited to many calculations -- especially ones in graphics.
>>> Something like you have to pack your data so that all the x
>>> components are together, and all y components together, and all z
>>> components together. Rather than the way everyone normally stores
>>> these things as xyz, xyz. Maybe Altivec, SSE2 and SSE3 fix that
>>> though. At any rate I think maybe Intel's finally getting tired of
>>> being laughed at for their graphics performance so things are
>>> probably changing.
>>>
>>>
>>
>> I have never heard of any SIMD architecture where vectors works that
>> way. On SSE, Altivec or MMX the components for the vectors are always
>> stored in contiguous memory.
>
>
> Ok. Well, I've never used any of these MMX/SSE/Altivec things myself,
> so it was just heresay. But the source was someone I know in the
> graphics group at Intel. I must have just misunderstood his gripe, in
> that case.
>
>> In terms of graphics, this is pretty much optimal. Most manipulations
>> on vectors like rotations, normalization, cross product etc. require
>> access to all components simultaneously. I honestly don't know why
>> you would want to split each of them into separate buffers...
>>
>> Surely it is simpler to do something like this:
>>
>> x y z w x y z w x y z w ...
>>
>> vs.
>>
>> x x x x ... y y y y ... z z z z ... w w w ...
>
>
>
> Yep, I agree, but I thought that was exactly the gist of what this
> friend of mine was griping about. As I understood it at the time, he
> was complaining that the CPU instructions are good at planar layout x x
> x x y y y y ... but not interleaved x y x y x y.
>
> If that's not the case, then great.
>
> --bb
Seems it's great.
It doesn't really matter what the underlying data is. An SSE
instruction will add four 32-bit floats in parallel, nevermind whether
the floats are x x x x or x y z w. What meaning the floats have is up
to the programmer.
Of course, channelwise operations will be faster in planer (EX: add 24
to all red values, don't spend time on the other channels), while
pixelwise operations will be faster in interleaved (EX: alpha blending)
- these facts don't have much to do with SIMD.
Maybe the guy from intel wanted to help planar pixelwise operations
(some mechanism to help the need to dereference 3-4 different places at
once) or help interleaved channelwise operations (only operate on every
fourth float in an array without having to do 4 mov/adds to fill a 128
bit register).
More information about the Digitalmars-d
mailing list