Library for Linear Algebra?

Mon Mar 23 01:17:19 PDT 2009

Fawzi Mohamed wrote:
> On 2009-03-22 09:45:32 +0100, Don <nospam at nospam.com> said:
> 
>> Trass3r wrote:
>>> Don schrieb:
>>>> I abandoned it largely because array operations got into the 
>>>> language; since then I've been working on getting the low-level math 
>>>> language stuff working.
>>>> Don't worry, I haven't gone away!
>>>
>>> I see.
>>>
>>>>>
>>>>>> http://www.dsource.org/projects/lyla
>>>
>>> Though array operations still only give us SIMD and no multithreading 
>>> (?!).
>>
>> There's absolutely no way you'd want multithreading on a BLAS1 
>> operation. It's not until BLAS3 that you become computation-limited.
> 
> not true, if your vector is large you could still use several threads.

That's surprising. I confess to never having benchmarked it, though.
If the vector is large, all threads are competing for the same L2 and L3 
cache bandwidth, right?
(Assuming a typical x86 situation where every CPU has an L1 cache and 
the L2 and L3 caches are shared).
So multiple cores should never be beneficial whenever the RAM->L3 or 
L3->L2 bandwidth is the bottleneck, which will be the case for most 
BLAS1-style operations at large sizes.
And at small sizes, the thread overhead is significant, wiping out any 
potential benefit.
What have I missed?

> but you are right that using multiple thread at low level is a dangerous 
> thing, because it might be better to use just one thread, and 
> parallelize another operation at a higher level.
> Thus you need sort of know how many threads are really available for 
> that operation.

Yes, if you have a bit more context, it can be a clear win.

> I am trying to tackle that problem in blip, by having a global 
> scheduler, that I am rewriting.

I look forward to seeing it!

> 
>>> I think the best approach is lyla's, taking an existing, optimized C 
>>> BLAS library and writing some kind of wrapper using operator 
>>> overloading etc. to make programming easier and more intuitive.
> 
> blyp.narray.NArray does that if compiled with -version=blas, but I think 
> that for large vector/matrixes you can do better (exactly using 
> multithreading).

I suspect that with 'shared' and 'immutable' arrays, D can do better 
than C, in theory. I hope it works out in practice.

> 
>> In my opinion, we actually need matrices in the standard library, with 
>> a very small number of primitive operations built-in (much like 
>> Fortran does). Outside those, I agree, wrappers to an existing library 
>> should be used.
> 
>