seeding the pot for 2.0 features [small vectors]
Bill Baxter
dnewsgroup at billbaxter.com
Mon Jan 29 17:22:53 PST 2007
Chad J wrote:
> Bill Baxter wrote:
>> Mikola Lysenko wrote:
>>
>>> Bill Baxter wrote:
>>>
>> Yep, I agree, but I thought that was exactly the gist of what this
>> friend of mine was griping about. As I understood it at the time, he
>> was complaining that the CPU instructions are good at planar layout x
>> x x x y y y y ... but not interleaved x y x y x y.
>>
>> If that's not the case, then great.
>>
>> --bb
>
> Seems it's great.
>
> It doesn't really matter what the underlying data is. An SSE
> instruction will add four 32-bit floats in parallel, nevermind whether
> the floats are x x x x or x y z w. What meaning the floats have is up
> to the programmer.
>
> Of course, channelwise operations will be faster in planer (EX: add 24
> to all red values, don't spend time on the other channels), while
> pixelwise operations will be faster in interleaved (EX: alpha blending)
> - these facts don't have much to do with SIMD.
>
> Maybe the guy from intel wanted to help planar pixelwise operations
> (some mechanism to help the need to dereference 3-4 different places at
> once) or help interleaved channelwise operations (only operate on every
> fourth float in an array without having to do 4 mov/adds to fill a 128
> bit register).
That could be. I seem to remember now the specific thing we were
talking about was transforming a batch of vectors. Is there a good way
to do that with SSE stuff? I.e for a 4x4 matrix with rows M1,M2,M3,M4
you want to do something like:
foreach(i,v; vector_batch)
out[i] = [dot(M1,v),dot(M2,v),dot(M3,v),dot(M4,v)];
Maybe it had to do with not being able to operate 'horizontally'. E.g.
to do a dot product you can multiply x y z w times a b c d easily, but
then you need the sum of those. Apparently SSE3 has some instructions
to help this case some. You can add x+y and z+w in one step.
By the way, are there any good tutorials on programming with SIMD
(specifically for Intel/AMD)? Everytime I've looked I come up with
pretty much nothing. Googling for "SSE tutorial" doesn't result in much.
As far as making use of SIMD goes (in C++), I ran across this project
that looks very promising, but have yet to give it a real try:
http://www.pixelglow.com/macstl/
--bb
More information about the Digitalmars-d
mailing list