Challenge: write a reference counted slice that works as much as possible like a built-in slice

Thu Nov 11 23:11:23 UTC 2021

On 2021-11-08 20:12, deadalnix wrote:
> On Monday, 8 November 2021 at 22:38:27 UTC, Andrei Alexandrescu wrote:
>>> shared_ptr does atomic operation all the time. The reality is that on 
>>> modern machines, atomic operation are cheap *granted there is no 
>>> contention*. It will certainly limit what the optimizer can do, but 
>>> all in all, it's almost certainly better than keeping the info around 
>>> and doing a runtime check.
>>
>> In my measurements uncontested atomic increment are 2.5x or more 
>> slower than the equivalent increment.
>>
> 
> Do you mind sharing this?

Quick and dirty code that's been long overwritten. Just redo it. Use C++ 
as a baseline.

> I find that curious, because load/stores on x86 are almost sequentially 
> consistent by default, and you don't even need sequential consistency to 
> increment the counter to begin with, so a good old `inc` instruction is 
> enough.
> 
> I'd look to look at what's the compiler is doing here, because maybe we 
> are trying to fix the wrong problem.

The overhead comes from the bus "lock" operation which both gcc and 
clang emit: https://godbolt.org/z/zx4cMYE39