Library for Linear Algebra?
Don
nospam at nospam.com
Mon Mar 23 01:17:19 PDT 2009
Fawzi Mohamed wrote:
> On 2009-03-22 09:45:32 +0100, Don <nospam at nospam.com> said:
>
>> Trass3r wrote:
>>> Don schrieb:
>>>> I abandoned it largely because array operations got into the
>>>> language; since then I've been working on getting the low-level math
>>>> language stuff working.
>>>> Don't worry, I haven't gone away!
>>>
>>> I see.
>>>
>>>>>
>>>>>> http://www.dsource.org/projects/lyla
>>>
>>> Though array operations still only give us SIMD and no multithreading
>>> (?!).
>>
>> There's absolutely no way you'd want multithreading on a BLAS1
>> operation. It's not until BLAS3 that you become computation-limited.
>
> not true, if your vector is large you could still use several threads.
That's surprising. I confess to never having benchmarked it, though.
If the vector is large, all threads are competing for the same L2 and L3
cache bandwidth, right?
(Assuming a typical x86 situation where every CPU has an L1 cache and
the L2 and L3 caches are shared).
So multiple cores should never be beneficial whenever the RAM->L3 or
L3->L2 bandwidth is the bottleneck, which will be the case for most
BLAS1-style operations at large sizes.
And at small sizes, the thread overhead is significant, wiping out any
potential benefit.
What have I missed?
> but you are right that using multiple thread at low level is a dangerous
> thing, because it might be better to use just one thread, and
> parallelize another operation at a higher level.
> Thus you need sort of know how many threads are really available for
> that operation.
Yes, if you have a bit more context, it can be a clear win.
> I am trying to tackle that problem in blip, by having a global
> scheduler, that I am rewriting.
I look forward to seeing it!
>
>>> I think the best approach is lyla's, taking an existing, optimized C
>>> BLAS library and writing some kind of wrapper using operator
>>> overloading etc. to make programming easier and more intuitive.
>
> blyp.narray.NArray does that if compiled with -version=blas, but I think
> that for large vector/matrixes you can do better (exactly using
> multithreading).
I suspect that with 'shared' and 'immutable' arrays, D can do better
than C, in theory. I hope it works out in practice.
>
>> In my opinion, we actually need matrices in the standard library, with
>> a very small number of primitive operations built-in (much like
>> Fortran does). Outside those, I agree, wrappers to an existing library
>> should be used.
>
>
More information about the Digitalmars-d
mailing list