Need for (C++20) Contiguous Range
Ola Fosheim Grøstad
ola.fosheim.grostad at gmail.com
Fri Oct 9 13:55:19 UTC 2020
On Friday, 9 October 2020 at 13:31:18 UTC, Steven Schveighoffer
wrote:
> We are talking about a range that is also contiguous in memory.
> The only logical reason to require this is to get a ptr and a
> length and use the CPU specialized instructions for these
> things. We have that type pattern, it's a slice. C++ didn't
> have that formally defined, so they needed to add something.
Yes, C++ has data(), but that does not imply that you provide an
array interface. Actually, I'm not even sure it implies that the
elements are in order... but it probably does now that they
require random-access properties.
Either way, you want a SIMD aligned slice. I guess if one had
some kind of interface where you fetchUnaligned first and from
there fetchAligned, and the finish with fetchUnaligned could work.
A better solution is to have a bitmask that tells you whether the
element is active or not. Then you just provide an aligned
pointer to the range-algorithm. But this isn't going to work for
safe code.
Modern SIMD on x86 has this built in.
This also means that you can continue to use SIMD operations
after filter.
So when you "pop" slices you might get a bit mask sequence like
this
00001111 11111111 11100000
(I show 8 elements per slice here, probably should be configured
to 2, 4, 8 16, 32, 64, 128, 256)
Ok, so then you have a filter than will return every other
element, then you get the same slices, but a different mask
pattern:
00001010 10101010 10100000
Pretty neat actually.
This would also work with randomly populated buffers (like a
hash).
> And when I got to thinking about the performance of such
> buffers (like, let's say a ring buffer that has 2 slices over
> the same underlying buffer, so you never have to copy), the
> cost of *indexing* is going to overwhelm the benefit of not
> having to copy. I even implemented a ringbuffer using memory
> map tricks, and it's not any better than a straight array.
I've literally spent weeks implementing various (some threadsafe)
ringbuffers. It can be surprisingly challenging when you start to
deal with slices and "transactional-like" features. Simple
concept, but it can be a challenge to get up to speed and also
being sure that it is correct.
There are some neat tricks one can use when implementing
ringbuffers that are 2^n depending on what operations you desire.
One important question is whether you always require one element
to be unused or not, that tiny detail can have a major impact on
the implementation :-D.
Anyway, when implementing a ring buffer, only include the
operations you actually need. The more generic it is, the more
overhead you'll get. Sadly. It is mind-blowing how many different
implementation strategies there are. (for such a trivial concept)
It is nearly impossible to create a best-of-breed library
implementation for that reason. Very sensitive to the use context.
But this is getting off-topic...
> Contiguous memory has the properties that allow the CPU to
> shine, and the way to say "I have contiguous memory" in D is:
> use a slice.
But that kinda implies that the buffer is indexable and in order.
More information about the Digitalmars-d
mailing list