Kinds of containers

Russel Winder via Digitalmars-d digitalmars-d at puremagic.com
Wed Oct 21 06:01:50 PDT 2015


On Wed, 2015-10-21 at 07:05 -0400, Andrei Alexandrescu via Digitalmars-
d wrote:
> 
[…]
> 1. Functional containers.
> 
> These are immutable; once created, neither their topology nor their 
> elements may be observably changed. Manipulating a container entails 
> creating an entire new container, often based on an existing
> container 
> (e.g. append takes a container and an element and creates a whole new
> container).
> 
> Internally, functional containers take advantage of common
> substructure 
> and immutability to share actual data. The classic resource for
> defining 
> and implementing functional containers is 
> http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0
> 521663504.

I believe (currently, I am open to being re-educated) that most of the
rest of the world calls these persistent data structures, despite the
title of Okasaki's thesis. I am not sure I like either of the labels
"functional" or "persistent", there is little affordance with the
concept underneath.

> 2. Reference containers.
> 
> These have classic reference semantics (à la Java). Internally, they
> may 
> be implemented either as class objects or as reference counted
> structs.
> 
> They're by default mutable. Qualifiers should apply to them
> gracefully.
> 
> 3. Eager value containers.
> 
> These are STL-style. Somewhat surprisingly I think these are the
> worst 
> of the pack; they expensively duplicate at the drop of a hat and need
> to 
> be carefully passed around by reference lest performance silently
> drops. 
> Nevertheless, when used as members inside other data structures value
> semantics might be the appropriate choice. Also, thinking of them as 
> values often makes code simpler.
> 
> By default eager value containers are mutable. They should support 
> immutable and const meaningfully.

Is there a need for separate eager structures explicity? Is it not
enough for all structures to be lazy with there being a pull operation?

> 4. Copy-on-write containers.
> 
> These combine the advantages of value and reference containers: you
> get 
> to think of them as values, yet they're not expensive to copy.
> Copying 
> only occurs by necessity upon the first attempt to change them.
> 
> The disadvantage is implementations get somewhat complicated. Also,
> they 
> are shunned in C++ because there is no proper support for COW; for 
> example, COW strings have been banned starting with C++11 which is
> quite 
> the bummer.
> 
> Together with Scott Meyers, Walter figured out a way to change D to 
> support COW properly. The language change consists of two attributes.
> 
> =======
> 
> I'll attempt to implement a few versions of each and see what they
> look 
> like. The question here is what containers are of interest for D's 
> standard library.

N-dimensional arrays, N-dimensional dynamic array, singly-linked list,
doubly-linked list, hash map (aka dictionary, associative array,…),
red-black tree, 2-3 tree, doubly-linked tree, tree map, b-tree. And no
doubt a few others will be proposed since they are classic, appear in
undergraduate courses, and are actually used for real.

However these are classic sequential data structures. We should be
looking to support data structures that can be used in a multi-threaded 
context without having explicit locks. The obvious exemplars to
indicate the way are ConcurrentHashMap in Java and NumPy array. (cf.
other thread talking about these quasi-weird array data structures.)
NumPy array is an opaque data structure that should never be accessed
in any way other than the functions and higher-order functions from the
library. It is a nice design, works well enough for data scientists,
quants, bioinformatics people to think it is great, but fundamentally
is seriously slow. If D could provide data structures that did
NumPy/SciPy/Pandas things significantly faster than NumPy/SciPy/Pandas,
there would be traction. 

In this sense std.concurrent, std.parallelism, and std.datastructures
(?) whilst almost certainly remaining separate, must be able to
interwork.

The alternative is the sort of hack that goes on in Java where data
structures are sequential but you can provide thread-safe versions.
This is fine for small data structures, but is not acceptable for large
ones in a multicore situation.  Hence ConcurrentHashMap, and the ilke.

Then there is the question whether to try and do the sort of thing
Chapel and X10 are doing. They have the ideas of domain and places
which allow for partitioned arrays that can be mapped to clusters of
multicore, multiprocessor machines — Partitioned Global Address Space.
IBM have now released their APGAS stuff over Hazelcast for Java users.
This will begin to challenge Apache Spark (and Hadoop).

Apache Spark is in many ways like NumPy in it's architecture, an opaque
data N-dimensional array operated on by functions and higher-order
functions.

Obviously I am conflicted here: I use Go for networking (but mostly
because I earn money running learning workshops), quite like Rust
because it is the fastest language currently on my microbenchmarks,
like D because it's nice, love Chapel because it is great (better than
D in so many ways), sort of like X10 (because it is a bit Scala like
and can be JVM or native), but end up having to work on some C++
codebases. I am not entirely sure D can compete with Chapel in HPC and
increasingly the Python-verse. Laeeth and others are championing D
instead of Python, but really the competition is Python/Chapel vs.
Python/NumPy/SciPy/Pandas. Unless that is D can get the Pandas style
data structures. Which brings my ramblings back on point. 

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder at ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel at winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20151021/51973b3b/attachment.sig>


More information about the Digitalmars-d mailing list