BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston
dac at nospam.com.au
Wed Apr 4 01:53:53 PDT 2007
Witold Baryluk wrote:
> Hello,
>
> Very good work. I'm impressed. Actually i was developing very
> similar program, but yours is practicly all I was needing.
> I'm developing linear algebra package in D, and now I'm optimising
> it for vector machines, so your BLAS1-like pacakge will be
> helpful. :)
Do you intend to implement BLAS2 and BLAS3-like functionality? I feel
that I don't know enough about cache-efficient matrix blocking
techniques to be confident in presenting code. I don't know how
sophisticated the techniques are in libraries like ATLAS, but the ones
used in Blitz++ should be easy to compete with.
Blade is still mostly proof-of-concept rather than being
industrial-strength -- there are so many possibilities for improvement,
it's not well tested, and I haven't even put any effort into making the
code look nice. But as Davidl said, it shows that hard-core back-end
optimisation can now be done in a library at compile time.
The code is quite modularised, so it would be straightforward to add a
check for 'is it SSE-able' (ie, are all the vectors floats) and 'is it
SSE2-able' (are all the vectors doubles), and if so, send them off into
specialised assemblers, otherwise send them to the x87 one. The
postfix-ing step will involve searching for *+ combinations where fma
can be used. The ability to know the vector length in many cases can be
a huge advantage for SSE code, where loop unrolling by a factor of 2 or
4 is always necessary. I haven't got that far yet.
I'm pretty sure that this type of technique can blow almost everything
else out of the water. <g>
More information about the Digitalmars-d
mailing list