How should I sort a doubly linked list the D way?

Tue Aug 13 21:11:41 UTC 2019

On Tue, Aug 13, 2019 at 08:42:37PM +0000, Max Haughton via Digitalmars-d-learn wrote:
> On Tuesday, 13 August 2019 at 18:54:58 UTC, H. S. Teoh wrote:
[...]
> > These days, with CPU multi-level caches and memory access
> > predictors, in-place arrays are often the best option for
> > performance, up to a certain size.  In general, avoid indirection
> > where possible, in order to avoid defeating the memory access
> > predictor and reduce the number of cache misses.
[...]
> I saw a Bjarne Stroustrup talk where he benchmarked that the for n >
> 1, std::vector was a lot faster than a linked list for all supported
> operations. I don't know how clever the caching strategies are on a
> modern processor (Pointer chasing), but to my knowledge the only way
> of getting a cache efficient linked list would be to effectively have
> a very contiguous allocator (Which obviously defeats the purpose of
> using a list in the first place)
> 
> Found it: https://www.youtube.com/watch?v=YQs6IC-vgmo

A contiguous allocator doesn't help after the list has undergone a large
number of insertions/deletions, because of fragmentation.

You can still get a cache-efficient linked list by chunking. I.e.,
instead of a linked-list of individual elements, group the elements into
blocks (arrays) that are then linked into a linked list.  Then you can
still insert/delete elements with sub-O(1) complexity by updating only
the affected block, and only occasionally you need to delete a block or
allocate a new block.

As long as the block size is suitably-sized, it will be cache-coherent
most of the time and enjoy the associated performance boosts from the
cache hierarchy.  For small lists, it will be equivalent to using a
single array. For very large lists, you'll also reap the cheap
insert/delete flexibility of a linked list.

To minimize space wastage from unused slots in the blocks, you could
have a list rebalancing based on some desired utilization factor (e.g.,
if <50% of a block is used, redistribute elements among adjacent
blocks). Redistribution should be pretty cache-coherent (except perhaps
for some corner cases), because it mostly involves linear copying of
elements from one block to another.

T

-- 
I am not young enough to know everything. -- Oscar Wilde