Small Vectors Proposal
Mikola Lysenko
mclysenk at mtu.edu
Tue Jan 30 15:07:24 PST 2007
Benji Smith wrote:
> Right now, I'm working on a statistical prediction algorithm that uses
> *sparse* 300,000-dimensional feature vectors. There's no reason to limit
> SSE optimizations to small-dimensionality vectors.
>
I can understand your desire to create a universal solution, and for a
long time I too was convinced this was the way to go. However I have
come to understand that there are many practical differences between low
and high dimension vectors. Think about it this way: would you try to
use a sparse matrix/vector library for doing operations on 4-d vectors
and matrices? It should be clear that these are two different very
different types of problems.
If we tried to use pass-by-value semantics and array operations on
gigantic vectors the result would be equally bad. Obviously passing a
300k+ element array around by value is going to destroy your
performance. The classic solution is to create a series of custom
routines for whichever problem you are trying to solve, or perhaps use a
robust optimized library like BLAS for the heavy lifting.
High dimension linear algebra is a tricky subject, and it can take years
of research to get algorithms with competitive performance. On the
other hand, Low dimension vectors require relatively trivial compiler
support, and are used far more often. To get a rough estimate of their
relative popularity, I invoke Google CodeSearch:
vector math -sparse : 282k hits
http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code
vector sparse : 31k hits
http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search
From this informal statistic, we could infer that the low dimensional
vectors are used at least an order of magnitude more frequently than
their higher order counterparts.
I do not mean to say that fast sparse vector/matrix operations are
unimportant, after all a fast matrix solver can sometimes shave weeks
off your processing time. In such a situation, it is permissible for a
library to be slightly cumbersome - just as long as it gets the job done
fast.
Compare this to small vectors, where many applications place convenience
and readability above performance. Integrating small vectors into the
core language will not only improve their usability, but it should also
provide substantial performance gains.
More information about the Digitalmars-d
mailing list