[gsoc] Mir project
Seb
seb at wilzba.ch
Tue Mar 19 11:16:37 UTC 2019
So here's another discussion thread for this year's GSoC. This
time it's about the Mir project.
The wiki already contains a few infos:
https://wiki.dlang.org/GSOC_2019_Ideas#Mir_Project
@community: what do you miss the most in the Mir project?
Also, Ilya recently sent my a long email to one student with more
details on the DataFrame project and I wanted to make it
available to all students:
---
You can choose almost any project you can finish during GSoC.
Common requirements if you choose me as your mentor:
1. You should care about time limits, reports, and GSoC
formalities yourself.
2. I wouldn't spend a lot of time on GSoC. Almost all the things
you would need to understand yourself. I will only formulate a
final goal, API and implementation requirement, intermediate
goals. If you would do cool things I would help to do them even
better.
3. The final result of the project should have a sensible
positive impact on Mir and D in general. A project should be
completely ready to be accepted.
4. A GSoC project should have professional quality. You would
need to become a professional in the field you choose a GSoC
project, this is a mandatory requirement.
For example, if you choose to implement basic matrix operations
in D, then the two links to start would be:
- Anatomy of High-Performance Matrix Multiplication
(https://www.cs.utexas.edu/users/flame/pubs/GotoTOMS_final.pdf)
- [Experimental] LLVM-accelerated Generic Linear Algebra
Subprograms (https://github.com/libmir/mir-glas)
To work on GLAS you would need to understand well Goto's paper,
LLVM IR, SIMD programming with LDC, GLAS source code.
DataFrame project
=================
mir-algorithm package (https://github.com/libmir/mir-random) has
Slice/ndslice (numpy.ndarray analog) and Series (pandas.Series
analog). Series should be fused into Slice, Slice would be a
generalized multidimensional DataFrame analog. Labels (indexes)
will be optional, the current Slice API and speed will be
preserved. However, this would make the development of generic
libraries hard. To make it simpler, we need to improve D language
and DMD compiler. This can be split into two parts: language
change (DIP) and pull request with required changes in DMD.
The DataFrame GSoC project results will be accepted if you write
the 'clever alias' DIP AND the DIP is approved by Andrei
Alexandrescu and Walter Bright before the end of the GSoC AND you
will also do at least one of the following:
1. Implement the DIP for DMD compiler. (DMD is written in D, but
I have no idea about its internals) OR
2. Add Labels(Indexes) support to ndslices package to make Slice
a generalization of DataFrame
It is quite a risky project, comparing with GLAS and FFT the
DataFrame project also requires very well communication skills, a
lot of patience and some luck.
Links to start with for DataFrame:
https://issues.dlang.org/show_bug.cgi?id=16486
https://issues.dlang.org/show_bug.cgi?id=16465
The brief DIP idea is that the code like below should work:
alias PackedUpperTriangularMatrix(T) = Slice!(StairsIterator!(T*,
"-"));
// fails, issue 16486
auto foo(T)(PackedUpperTriangularMatrix!T m)
{
}
// Current workaround: it is too crazy for users to
// know what is StairsIterator!(T*, "-")).
auto foo(T)(Slice!(StairsIterator!(T*, "-")) m)
{
}
Currently used Slice types in Lubeck / Production code
Slice!(double*) - D slice analog
Slice!(double*, 1, Universal) - BLAS vector, used in mir-lapack
and mir-blas.
Slice!(double*, 2) - Contiguous matrix, that has an efficient
loop for iteration over elements, see mir.algorithm.iteration
sources.
Slice!(double*, 2, Canonical) - BLAS/LAPACK matrix
representation, used in mir-lapack and mir.blas
Slice!(double*, N, Universal) - zero copy view to work with
ndarray in numpy, see also low level API bindgins, and high level
bindings
Slice!(StairsIterator!(double*, "+")) and ...
Slice!(StairsIterator!(double*, "-")) - packed storage for
triangular matrixes, for BLAS/LAPACK
Slice!(ChopIterator!(size_t*, uint*)); - Memory efficient graph
representation without labels.
Possible future Slice types (2019?):
Slice!(double*, 1, Contiguous, string*) - like Pandas Series
Slice!(double*, 2, Contiguous, LabelT1*, LabelT2*) - like Pandas
DataFrame
Slice!(double*, 2, Contiguous, LabelT1*, LabelT2*, LabelT3*) -
like Pandas Panel
Slice!(ChopIterator!(size_t*, uint*), 1, Contiguous, string*); -
Memory efficient graph representation with labels.
Slice!(ChopIterator!(size_t*, Slice!(double*, 1, Contiguous,
uint*))) - Sparse Matrix representation that can be used to
interact with existing C/C++/Fortran libraries
If you would be able to write a good DIP and create a pull
request with its implementation it would be awesome. I can pay
400$ as a bonus if the DIP implementation is merged to DMD.
---
More information about the Digitalmars-d
mailing list