Small Vectors Proposal

Tue Jan 30 15:07:24 PST 2007

Benji Smith wrote:
> Right now, I'm working on a statistical prediction algorithm that uses 
> *sparse* 300,000-dimensional feature vectors. There's no reason to limit 
> SSE optimizations to small-dimensionality vectors.
> 

I can understand your desire to create a universal solution, and for a 
long time I too was convinced this was the way to go.  However I have 
come to understand that there are many practical differences between low 
and high dimension vectors.  Think about it this way: would you try to 
use a sparse matrix/vector library for doing operations on 4-d vectors 
and matrices?  It should be clear that these are two different very 
different types of problems.

If we tried to use pass-by-value semantics and array operations on 
gigantic vectors the result would be equally bad.  Obviously passing a 
300k+ element array around by value is going to destroy your 
performance.  The classic solution is to create a series of custom 
routines for whichever problem you are trying to solve, or perhaps use a 
robust optimized library like BLAS for the heavy lifting.

High dimension linear algebra is a tricky subject, and it can take years 
of research to get algorithms with competitive performance.  On the 
other hand, Low dimension vectors require relatively trivial compiler 
support, and are used far more often.  To get a rough estimate of their 
relative popularity, I invoke Google CodeSearch:

vector math -sparse : 282k hits
http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code

vector sparse : 31k hits
http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search

 From this informal statistic, we could infer that the low dimensional 
vectors are used at least an order of magnitude more frequently than 
their higher order counterparts.

I do not mean to say that fast sparse vector/matrix operations are 
unimportant, after all a fast matrix solver can sometimes shave weeks 
off your processing time.  In such a situation, it is permissible for a 
library to be slightly cumbersome - just as long as it gets the job done 
fast.

Compare this to small vectors, where many applications place convenience 
and readability above performance.  Integrating small vectors into the 
core language will not only improve their usability, but it should also 
provide substantial performance gains.