dcollections version 0.02

Wed Aug 6 08:23:15 PDT 2008

Steven Schveighoffer:
> I'm assuming you are talking about STL's deque?

I am talking about Python collections.deque:
http://docs.python.org/lib/deque-objects.html
The purpose of my libs is yet another, (often but not always) to mimic the ease of use of Python ;-)

> I don't have anything exactly.  You can implement a stack with the LinkList class.

A normal linked list may be too much slow for this because it allocates too many small structures, and iterating on it can be slow for low space efficiency and low cache coherence.

> If you end up finishing your deque, I'll gladly take a look at adding it to 
> dcollections.

Very good :-) But I don't know how much well its style can fit in. For example I have already written an hash set into my libs, but its API is modeled around the Python sets API...

> Actually, it wouldn't work to provide O(1) lookup.  I think you need 
> probably 2 arrays of arrays to provide the O(1) prepend, one that goes 
> 'down' instead of 'up'.
> I'll think about how this could be implemented.

I think there are two good ways to implement a mutable deque data structure:

1) Dynamic array of pointers to fixed-size arrays that I call blocks.
2) Unrolled linked list (double linked, usually), that is a chain of fixed-size arrays, where you add/remove items only at the tail/head and not in the middle.

Image of the first implementation:
http://pages.cpsc.ucalgary.ca/~kremer/STL/1024x768/deque.jpeg

(If you are using immutable data structures you need much more complex stuff, like finger trees).

Python deque is implemented as the second one, the deque of the STL is often the first one.

They have different advantages and disadvantages, so ideally you may create a collection that includes both implementations (but I presume no one does this). And both can be tuned changing the block size (a smart data structure can keep few runtime statistics of its usage and adapt its block size to the specific usage as time goes on, I'll think about this).

They are both fast in iteration, probably just about 2-2.5 times slower than iterating over a normal array.

The second one is probably a bit simpler to implement. The bigger difference is that in the second one the access time to random items isn't O(1), but if the blocks are large enough this isn't a big problem.

The first implementation may be a bit slower for the normal deque usage, because once in a while you have to reset/tidy the main dynamic array (see below).

In both implementations you have to keep two pointers or two lengths that keep how much items are contained into the first and last fixed-size arrays (in a generic unrolled linked list implementation you can have holes everywhere, so you have to store how many items are present in each block, and you have to merge/split adjacent blocks too much empty/filled up, this makes the management less easy. A deque makes things simpler).

In the first implementation the dynamic array can be managed as a circular deque, so you may need a modulus operation each time you access an item randomly, this requires a bit of time more than accessing items sequentially in the second implementation.

When the circular deque is too much empty/filled up, you have to create a new one and copy data, but this generally requires little time because this dynamic arrays is 300/1000/10000 times smaller than the total number of items inserted into the deque. So there's probably no need to invent more complex management schemes for this, like you say.

Bye,
bearophile