2016Q1: std.blas
Ilya Yaroshenko via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Sat Dec 26 11:57:19 PST 2015
Hi,
I will write GEMM and GEMV families of BLAS for Phobos.
Goals:
- code without assembler
- code based on SIMD instructions
- DMD/LDC/GDC support
- kernel based architecture like OpenBLAS
- 85-100% FLOPS comparing with OpenBLAS (100%)
- tiny generic code comparing with OpenBLAS
- ability to define user kernels
- allocators support. GEMM requires small internal allocations.
- @nogc nothrow pure template functions (depends on allocator)
- optional multithreaded
- ability to work with `Slice` multidimensional arrays when
stride between elements in vector is greater than 1. In common
BLAS matrix strides between rows or columns always equals 1.
Implementation details:
LDC all : very generic D/LLVM IR kernels. AVX/2/512/neon
support is out of the box.
DMD/GDC x86 : kernels for 8 XMM registers based on core.simd
DMD/GDC x86_64: kernels for 16 XMM registers based on core.simd
DMD/GDC other : generic kernels without SIMD instructions.
AVX/2/512 support can be added in the future.
References:
[1] Anatomy of High-Performance Matrix Multiplication:
http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf
[2] OpenBLAS https://github.com/xianyi/OpenBLAS
Happy New Year!
Ilya
More information about the Digitalmars-d-announce
mailing list