Iterators and Ranges: Comparing C++ to D to Rust

Mon Jun 14 17:49:46 UTC 2021

On Monday, 14 June 2021 at 17:19:35 UTC, Steven Schveighoffer 
wrote:
> Oh for sure. Generalizing the relationships that nodes have 
> would be pretty difficult. Trees need 2 pointers at least 
> (possibly a parent pointer as well), whereas a linked list only 
> needs one (if it's not bidirectional).

Yes, my idea is to keep the chunk-header within two cache lines 
(or one), but there should be auxillary space there for the 
container as well. So when you insert a chunk-list into a new 
container there is meta-info about the nature of the data (are 
all chunks filled up, data sorted etc), so the container can then 
decide whether to run over the chunks and "fix" the 
structure/invariants. So if you take a singled linked list and 
insert it into a double linked one with a hash index, then the 
double linked list will run over it and update backpointers and 
stuff in key-hash-values.

However, in my next attempt at this I will try to make it SIMD 
friendly, I think. So using a struct-of-arrays with maybe a 
bitmask that says whether a slot is occupied. So each chunk would 
have a 32, 64 or 256 element capacity or something like that.

Then maybe a table pointer would be a pointer to the chunk + a 
bitmask? But then it makes more sense to use a ranges approach. 
So you point to first chunk with bitmask and the last chunk with 
bitmask. So pointing to the end would be pointing to the last 
chunk with a zero-bitmask? Not sure… :-D

It is funny how implementation choices for containers changes how 
one thinks about table pointers/iterators/ranges.

> Unless you allocate nodes for the actual spaces, you have to 
> deal with this kind of stuff. I felt like pointing at an 
> element, and then storing whether that data is valid or not was 
> a pretty reasonable way to have element pointers without safety 
> problems.

Yes, but how does it affect efficiency?

It really depends on how they are used, for instance, if you 
provide all the algorithms yourself, then inefficent 
table-pointers/cursors can be turned into internal raw pointers 
and then it does not matter as much (as they only are begin/end 
markers that are mutated infrequently).

> My original use case for making dcollections was to have a 
> fast-lookup map (using RBTree), and be able to remove items as 
> I iterate it the tree. Using ranges for this just doesn't make 
> sense in the general case, but using cursors works well.

Yes. I am concerned about how a struct-of-arrays datastructure 
approach affects these kind of considerations.

Of course, there is also the question of whether the LDC/GDC 
backend are able to compile to bitmasked SIMD instructions 
(bitmasks that says whether an element is present or not)?? I 
wonder if that is at all possible or if I will have to write 
explicit SIMD code?