Discussion Thread: DIP 1028--Make @safe the Default--Final Review

Wed Apr 1 02:43:09 UTC 2020

Nice and interesting writeup Joe!

I might shine some light here:

On 3/31/20 5:26 PM, Joseph Rushton Wakeling wrote:
> 
> I don't think anybody ever did work out exactly what the problem had 
> been in the early days, but it's likely relevant that by the time I 
> broke the rules, the company had been using 64-bit DMD for a long time.  
> IIRC what was suspected (N.B. this is from memory and from someone who 
> is not an expert on the internals of the GC:-) was that with the 32-bit 
> GC there was something about the size of GC pools or memory chunks that 
> meant that it was very likely that you could wind up with a chunk of GC 
> memory where all of it was in principle recyclable except for a couple 
> of bytes, and hence you would allocate new chunks and then the same 
> thing would happen with them, and so on until you were using far more 
> chunks than should really have been needed.

The biggest problem in 32-bit land is that the address space is so 
small. With a conservative GC, it treats things that aren't pointers as 
pointers. This means that depending on where the system lays out your 
memory, likely integers have a better chance of "pinning" memory. In 
other words, some int on a stack somewhere is actually treated as a 
pointer holding some piece of memory from being collected. If that 
memory has pointers in it, maybe it also has ints too. Those ints are 
treated as pointers, so now more memory could be "caught". As your 
address space available shrinks, the chances of having false pinnings 
get higher, so it's a degenerative cycle.

With 64-bit address space, typically everything is allocated far away 
from typical long values, so the pinning is much rarer.

I'm not sure if this matches your exact problem, but I definitely am 
sure that 64-bit D is much less likely to leak GC memory than 32-bit D.

> The team that grew out of the app I was working on never did have to 
> really care about GC issues, but ironically I did wind up rewriting that 
> same app to make a lot more use of recyclable buffers, though not 
> preallocation.  I don't recall that it was ever really _necessary_, 
> though: it was more of a precaution to try and ensure the same memory 
> consumption for D1 and D2 builds of the same app, given that D2's GC 
> seemed happy to allocate a lot more memory for the same "real" levels of 
> use.  Most likely D2 just allowed the size of the GC heap to grow a lot 
> more before triggering a collection, but we were hyper-cautious about 
> getting identical resource usage just on the offchance it might have 
> been something nastier.

This I'm sure I can answer :) It is actually something I added to the 
runtime -- the non-stomping array feature. In D1, an array was only 
appendable if it was pointing at the beginning of the block. There was 
no assumeSafeAppend. So if you for instance allocated a block of 16 
bytes, you got a 16-byte block from the GC.

But the drawback was that you could overwrite memory that was still 
referenced without meaning to.

With the non-stomping feature, the "used" space of the array is stored 
in the block as well (at the end of the block). This allows the array 
runtime to know when it's safe to append in-place, or when a new block 
has to be allocated. This is actually quite necessary especially for 
immutable data such as strings (overwriting still-accessible immutable 
data is undefined behavior in D2).

The drawback though, is that allocating an array of 16 bytes really 
needs 17 bytes (one byte for the array length stored in the block). 
Which actually ends up allocating a 32-byte block (GC blocks come in 
powers of 2).

Since then, we are also storing the typeinfo in the block if the data 
has a destructor, meaning less space for actual data.

So this probably explains why a D2 app is going to consume a bit more 
memory than a D1 app that is written the same.

-Steve