Library Development: What to finish/flesh out?

Thu Mar 17 12:48:20 PDT 2011

dsimcha napisał:

> I've accumulated a bunch of little libraries via various evening and weekend
> hacking projects over the past year or so, in various states of completion.
> Most are things I'm at least half-considering for Phobos, though some belong
> as third-party libs.  I definitely don't have time to finish/flesh out all of
> them anytime soon, so I've decided to ask the community what to prioritize.
> Below is a summary of everything I've been working on, with its current level
> of completion.  Please let me know the following:
> 
> 1.  A relative ordering of how useful you think these libraries would be to
> the community.
> 
> 2.  In absolute terms, would you find this useful?
> 
> 3.  For the Phobos candidates, whether they're general enough to belong in the
> **standard** library.
> 
> List in order from most to least finished:
> 
> 1.  Rational:  A library for handling rational numbers exactly.  Templated on
> integer type, can use BigInts for guaranteed accuracy, or fixed-width integers
> for more speed where the denominator and numerator will be small.  Completion
> state:  Mostly finished.  Just need to fix a litte bit rot and submit for
> review.  (Phobos candidate)

I'd find it useful. As for its presence in Phobos, I'm uncertain if it's in enough demand.

> 2.  RandAA:  A hash table implementation with deterministic memory management,
> based on randomized probing.  Main advantage over builtin AAs is that it plays
> much nicer with the GC and multithreaded programs.  Lookup times are also
> expected O(1) no matter how many collisions exist in modulus hash space, as
> long as there are few collisions in full 32- or 64-bit hash space.  Completion
> state:  Mostly finished.  Just needs a little doc improvement, a few
> benchmarks and submission for review.  (Phobos candidate)

Useful for me and in Phobos.

> 3.  TempAlloc:  A memory allocator based on a thread-local segmented stack,
> useful for allocating large temporary buffers in things like numerics code.
> Also comes with a hash table, hash set and AVL tree optimized for this
> allocation scheme.  The advantages over plain old stack allocation are that
> it's independent of function calls (meaning you can return pointers to
> TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning
> you can allocate huge buffers w/o risking stack overflow.  Its main weakness
> is that this stack is not scanned by the GC, meaning that you can't store the
> only reference to a GC-allocated piece of memory here.  However, in practice
> large arrays of primitives are an extremely common case in
> performance-critical code.  I find this module immensely useful in dstats and
> Lars Kyllingstad uses it in SciD.  Getting it into Phobos would make it easy
> for other scientific/numerics code to use it.  Completion state:  Working and
> used.  Needs a litte cleanup and documentation.  (Phobos candidate)

Useful for me, don't know if for everyone else.

> 4.  Streaming CSV Parser:  Parses CSV files as they're read in, a few
> convenience functions for extracting columns into structs.  If Phobos every
> gets SQLite support I'll probably add sugar for turning a CSV file into an
> SQLite database, too.  Completion state:  Prototype working, needs testing,
> cleanup and documentation.  (Phobos candidate)

You mean a lazy slurp? It'd be useful for everyone.

> 5.  Matrix operations:  SciD improvements that allow you to write matrix
> operations that look like normal math/MATLAB and optimizes them via expression
> templates so that a minimal number of temporary matrices are created.
> Uses/will use BLAS for multiplication.  Completion state:  Addition
> implemented.  Multiplication not.

It is worth considering standardizing at least matrix expressions in Phobos. The motivation is analogous to ranges -- to run an algorithm from lib A on a matrix container from lib B. C++ would be green with envy.

I'd be glad to be part of the effort once I'm done with xml.

> 6.  Machine learning:  Decision trees, KNN, Random Forest, Logistic
> Regression, SVM, Naive Bayes, etc.  This would be a dstats module.  Completion
> state:  Decision trees prototyped, logistic regression working.

I'd find it useful, I think anyone who's into this would too.

> 7.  std.mixins:  Mixins for commonly needed boilerplate code.  I stopped
> working on this when Andrei suggested that making a collection of mixins into
> a module is a bad idea.  I've thought about it some more and I respectfully
> disagree.  std.mixins would be a one-stop shop for pretty much any boilerplate
> you need to inject, and most of this code doesn't fit in any other obvious
> place.  Completion state:  A few things (struct comparison, simple class
> constructors, Singleton pattern) prototyped.  (Phobos candidate)

I'm afraid I also think functionality should be categorized by the purpose it serves rather than implementation technique.

> 8.  GZip support in std.file:  I'll leave the stream stuff for someone else,
> but just simple stuff like read(), write(), append() IMHO belongs in std.file.
>  Completion state:  Not started, but this is the easiest of the bunch to
> implement.  (Phobos candidate)

I don't know really...

-- 
Tomek