Kinds of containers

Wed Oct 21 06:11:38 PDT 2015

On 10/21/2015 09:01 AM, Russel Winder via Digitalmars-d wrote:
> I believe (currently, I am open to being re-educated) that most of the
> rest of the world calls these persistent data structures, despite the
> title of Okasaki's thesis. I am not sure I like either of the labels
> "functional" or "persistent", there is little affordance with the
> concept underneath.

OK. We should go with the more widespread name.

>> By default eager value containers are mutable. They should support
>> immutable and const meaningfully.
>
> Is there a need for separate eager structures explicity? Is it not
> enough for all structures to be lazy with there being a pull operation?

Well it is but I wanted to clarify where STL-style containers fit.

> N-dimensional arrays, N-dimensional dynamic array, singly-linked list,
> doubly-linked list, hash map (aka dictionary, associative array,…),
> red-black tree, 2-3 tree, doubly-linked tree, tree map, b-tree. And no
> doubt a few others will be proposed since they are classic, appear in
> undergraduate courses, and are actually used for real.

These are orthogonal on the kinds I discussed. So we have container 
topologies (which you enumerate a few of) and container kinds, which 
concerns copying semantics of the entire containers.

> However these are classic sequential data structures.

Hashes and some of the trees would actually be associative data structures.

> We should be
> looking to support data structures that can be used in a multi-threaded
> context without having explicit locks. The obvious exemplars to
> indicate the way are ConcurrentHashMap in Java and NumPy array. (cf.
> other thread talking about these quasi-weird array data structures.)
> NumPy array is an opaque data structure that should never be accessed
> in any way other than the functions and higher-order functions from the
> library. It is a nice design, works well enough for data scientists,
> quants, bioinformatics people to think it is great, but fundamentally
> is seriously slow. If D could provide data structures that did
> NumPy/SciPy/Pandas things significantly faster than NumPy/SciPy/Pandas,
> there would be traction.

Cool thoughts, thanks. Those would have their own APIs, separate from 
mainstream containers.

> Then there is the question whether to try and do the sort of thing
> Chapel and X10 are doing. They have the ideas of domain and places
> which allow for partitioned arrays that can be mapped to clusters of
> multicore, multiprocessor machines — Partitioned Global Address Space.
> IBM have now released their APGAS stuff over Hazelcast for Java users.
> This will begin to challenge Apache Spark (and Hadoop).

We don't support PGAS at language level, but I'll need to look into how 
APGAS works with Java.

> Obviously I am conflicted here: I use Go for networking (but mostly
> because I earn money running learning workshops), quite like Rust
> because it is the fastest language currently on my microbenchmarks,
> like D because it's nice, love Chapel because it is great (better than
> D in so many ways), sort of like X10 (because it is a bit Scala like
> and can be JVM or native), but end up having to work on some C++
> codebases. I am not entirely sure D can compete with Chapel in HPC and
> increasingly the Python-verse. Laeeth and others are championing D
> instead of Python, but really the competition is Python/Chapel vs.
> Python/NumPy/SciPy/Pandas. Unless that is D can get the Pandas style
> data structures. Which brings my ramblings back on point.

My finance folks also rave about Pandas. I wish I could fork myself to 
look into it.

Nevertheless, what I'm working on now should help libraries such as 
Pandas for D. We have a large language but are unclear on recipe-style 
idioms for writing library code.

Andrei