2016Q1: std.blas

Ilya Yaroshenko via Digitalmars-d-announce digitalmars-d-announce at puremagic.com
Sun Dec 27 08:14:33 PST 2015


On Sunday, 27 December 2015 at 10:28:53 UTC, Russel Winder wrote:
> On Sat, 2015-12-26 at 19:57 +0000, Ilya Yaroshenko via 
> Digitalmars-d- announce wrote:
>> Hi,
>> 
>> I will write GEMM and GEMV families of BLAS for Phobos.
>> 
>> Goals:
>>   - code without assembler
>>   - code based on SIMD instructions
>>   - DMD/LDC/GDC support
>>   - kernel based architecture like OpenBLAS
>>   - 85-100% FLOPS comparing with OpenBLAS (100%)
>>   - tiny generic code comparing with OpenBLAS
>>   - ability to define user kernels
>>   - allocators support. GEMM requires small internal 
>> allocations.
>>   - @nogc nothrow pure template functions (depends on 
>> allocator)
>>   - optional multithreaded
>>   - ability to work with `Slice` multidimensional arrays when
>> stride between elements in vector is greater than 1. In common
>> BLAS matrix strides between rows or columns always equals 1.
>
> Shouldn't to goal of a project like this be to be something 
> that OpenBLAS isn't? Given D's ability to call C and C++ code, 
> it is not clear to me that simply rewriting OpenBLAS in D has 
> any goal for the D or BLAS communities per se. Doesn't stop it 
> being a fun activity for the programmer, obviously, but unless 
> there is something that isn't in OpenBLAS, I cannot see this 
> ever being competition and so building a community around the 
> project.

It depends on what you mean with "something like this". OpenBLAS 
is _huge_ amount of assembler code. For _each_ platform for 
_each_ CPU generation for _each_ floating point / complex type it 
would have a kernel or few kernels. It is 30 MB of assembler code.

Not only D code can call C/C++, but also C/C++ (and so any other 
language) can call D code. So  std.blas may be used in C/C++ 
projects like Julia.

> Now if the threads/OpenCL/CUDA was front and centre so that a 
> goal was to be Nx faster than OpenBLAS, that could be a goal 
> worth standing behind.

It can be goal for standalone project. But standard library 
should be portable on any platform without significant problems 
(especially without problems caused by matrix multiplication). So 
my goal is tiny and portable project like ATLAS, but fast like 
OpenBLAS. BTW, threads in std.blas would be optional like in 
OpenBLAS. Futhermore std.blas will allow a user to write his own 
kernels.

> Not to mention full N-dimension vectors so that D could 
> seriously compete against Numpy in the Python world.

I am not sure how D can compete against Numpy in the Python 
world, but it can compete Python in world of programming 
languages. BTW, N-dimension ranges/arrays/vectors already 
implemented for Phobos:

PR:
https://github.com/D-Programming-Language/phobos/pull/3397

Updated Docs:
http://dtest.thecybershadow.net/artifact/website-76234ca0eab431527327d5ce1ec0ad74c6421533-fedfc857090c1c873b17e7a1e4cf853c/web/phobos-prerelease/std_experimental_ndslice.html

Please participate in voting (time constraints is extended) :-) 
http://forum.dlang.org/thread/nexiojzouxtawdwnlfvt@forum.dlang.org

Ilya




More information about the Digitalmars-d-announce mailing list