Small Vectors Proposal

Thu Feb 1 09:08:30 PST 2007

Mikola Lysenko wrote:
> Benji Smith wrote:
>> Right now, I'm working on a statistical prediction algorithm that uses 
>> *sparse* 300,000-dimensional feature vectors. There's no reason to 
>> limit SSE optimizations to small-dimensionality vectors.
>>
> 
> I can understand your desire to create a universal solution, and for a 
> long time I too was convinced this was the way to go.  However I have 
> come to understand that there are many practical differences between low 
> and high dimension vectors.  Think about it this way: would you try to 
> use a sparse matrix/vector library for doing operations on 4-d vectors 
> and matrices?  It should be clear that these are two different very 
> different types of problems.
> 
> If we tried to use pass-by-value semantics and array operations on 
> gigantic vectors the result would be equally bad.  Obviously passing a 
> 300k+ element array around by value is going to destroy your 
> performance.  The classic solution is to create a series of custom 
> routines for whichever problem you are trying to solve, or perhaps use a 
> robust optimized library like BLAS for the heavy lifting.
> 
> High dimension linear algebra is a tricky subject, and it can take years 
> of research to get algorithms with competitive performance.  On the 
> other hand, Low dimension vectors require relatively trivial compiler 
> support, and are used far more often.  To get a rough estimate of their 
> relative popularity, I invoke Google CodeSearch:
> 
> vector math -sparse : 282k hits
> http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code 
> 
> 
> vector sparse : 31k hits
> http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search
> 
>  From this informal statistic, we could infer that the low dimensional 
> vectors are used at least an order of magnitude more frequently than 
> their higher order counterparts.
> 
> I do not mean to say that fast sparse vector/matrix operations are 
> unimportant, after all a fast matrix solver can sometimes shave weeks 
> off your processing time.  In such a situation, it is permissible for a 
> library to be slightly cumbersome - just as long as it gets the job done 
> fast.
> 
> Compare this to small vectors, where many applications place convenience 
> and readability above performance.  Integrating small vectors into the 
> core language will not only improve their usability, but it should also 
> provide substantial performance gains.

I don't think its critical that high order matrix multiplications get 
the same performance gains as smaller arrays that use the SIMD.  Its 
just important that its consistent.

Putting vector operation into an array gives us more power because we 
get all the standard array operations as well (I concede that these 
operations could be put into the base arrays as well).

I hope that eventually that dynamic arrays would get the SIMD support as 
well.  That way you could for instance load an entire list of vertexes 
(or whatever) and apply an operation on all of them.  D could even do 
this in parallel.  (I concede here that you could probably do that with 
an array of these base types. Also vectors with 4 parts can be treated 
somewhat differently in that the 4th component is sometimes used for 
other reasons).

I generally think that representing vector types as arrays makes the 
language more powerful, then having individual special case types.

-Joel