Small Vectors Proposal

janderson askme at me.com
Sat Feb 3 16:19:21 PST 2007


Mikola Lysenko wrote:
> Knud Soerensen wrote:
>> I think it is better to make very general framework for describing the
>> vector operations, because when we have a very general description 
>> then the compiler is free to optimize as it sees fit.
>> You can think at it as a SQL for vector operations.
> 
> Moreover, I'm not sure why you split up the x-y components.  A much 
> better layout would look like this:
> 
> x y x y x y x y x y ...
> z z z ....
> 
> Now the alignment for each vector is such that you could load 2 x-y 
> pairs into a single SSE register at once.  This is not only better for 
> caching, but it is also better for performance.
> 
> However, neither the layout you proposed or the one above make much 
> sense.  Typically you want to do more with vectors than just drop the 
> z-component.  You need to perform matrix multiplication, normalization, 
> dot and cross products etc.  Each of these operates most efficiently 
> when all vector components are packed into a single SIMD register. 
> Therefore it seems clear that the preferred layout ought to be:
> 
> x y z w x y z w ....
> 
> I can't really fathom why you would want to go through the extra trouble 
> of splitting up each component.  Not only does it make arithmetic less 
> efficient, but it also creates a book keeping nightmare.  Imagine trying 
> to grow the total number of vectors in your original layout!  You would 
> need to do multiple costly memcpy operations.
> 
> For low dimension vectors, the necessary compiler optimizations are 
> obvious, and there is one clear 'best' data layout.  We don't need any 
> fancy compiler magic, since the code can be directly translated into 
> vector operations just like ordinary arithmetic.

I couldn't agree more with this last part.  However I must say that for 
'very particular problems' in the past I've found that splitting structs 
(vectors or whatever) up by there types can have significant performance 
gains when your doing batching operations on one component, due to cache.

However this so special case and would be very difficult for the 
compiler to figure out.  Its something I believe the programmer should 
do not the compiler.

-Joel



More information about the Digitalmars-d mailing list