On heap segregation, GC optimization and @nogc relaxing

Tue Nov 11 18:34:54 PST 2014

Hi all,

I want to get back on the subject of ownership, lifetime and 
propose some solution, but before, propose to state the problem 
in a way that haven't seen before (even if I have no doubt some 
have came to the same conclusion in the past).

The problem at hand is double: memory management and thread 
safety. Number one has been a hot topic for ages, and number 2 
has become very over the past years, to the widespreading of 
multicores CPU.

The problem at hand here is ownership of data. There are 3 roads 
you can go about it:
  - immutability and GC. Effectively, these 2 technique allow you 
to get rid of ownership. There are advantages and drawbacks i'm 
going to discuss later.
  - Being unsafe and rely on convention. This is the C++ road (and 
a possible road in D). It allow to implement almost any wanted 
scheme, but come at great cost for the developer.
  - Annotations. This is the Rust road. It also come a great cost 
for the developer, as some schemes may be non trivial to express 
granted the type system, but, contrary to the C++ road, is safe.

These approach all have some very nice things going on for them, 
but also some killer scenarios.

Immutability+GC allow to have safety while keeping interfaces 
simple. That is of great value. It also come with some nice 
goodies, in the sense that is it easy and safe to shared data 
without bookkeeping, allowing one to fit more in cache, and 
reduce the amount of garbage created. Most text processing apps 
fall into this category and this is why D is that good at them. 
Another big goodies is that many lock free algorithm become 
possible. Once you remove the need for bookkeeping of ownership 
many operations can be implemented in an atomic manner. 
Additionally, it is possible to implement various GC optimization 
on immutable heap, which make the GC generally more efficient. 
But the cost is also real. For some use case, this mean having a 
large amount of garbage generated (Carmack wrote a piece on 
haskell were he mention the disastrous effect that having a 
framebuffer immutable would have: you'd have to clone it 
everytime you draw in it, which is a no go). GC also tend to 
cause unpredictable runtime characteristics, which programs with 
real time constraint can have hard time to deal with.

Relying on convention has the advantage that any scheme can be 
implemented without constraint, while keeping interface simple. 
The obvious drawback is that it is time consuming and error 
prone. It also make a lot of things unclear, and dev choose the 
better safe than sorry road. That mean excessive copying to make 
sure one own the data, which is wasteful (in term of work for the 
copy itself, garbage generation and cache pressure). If this must 
be an option locally for system code, it doesn't seems like this 
is the right option at program scale and we do it in C++ simply 
because we have to.

Finally, annotations are a great way to combine safety and speed, 
but generally come at a great cost when implenting uncommon 
ownership strategies where you ends up having to express complex 
lifetime and ownership relations.

Ideally, we want to map with what the hardware does. So what does 
the hardware do ?

Multicore CPU have various cores, each of them having layers of 
cache. Cache is organized in cache line and each cache line can 
be in various modes. Actual system are quite complex and deal 
with problems we are not very interesting here (like writeback) 
but the general idea is that every cache line is owned with 
different modes.

Either the cache line is owned by a single core and can be 
written to, or the cache line shared by several cores, each of 
them having a local copy of the line, but none of them can write 
to. There is an internal bus where cores can exchange cache line 
with each other and messages to acquire cache line in read or 
read/write mode. That mean CPU are good at thread local 
read/write, shared immutable and transfer of ownership from one 
core to the other. They are bad at shared writable data (as 
effectively, the cache line will have to bounce back and forth 
between cores, and all memory access will need to be serialized 
instead of performed out of order).

In that world, D has a bizaro position were it use a combination 
of annotations (immutable, shared) and GC. Ultimately, this is a 
good solution. Using annotation for common cases, fallback on 
GC/unsafe code when these annotations fall short.

Before going into why it is fallign short, a digression on GC and 
the benefits of segregating the heap. In D, the heap is almost 
segregated in 3 groups: thread local, shared and immutable. These 
group are very interesting for the GC:
  - Thread local heap can be collected while disturbing only one 
thread. It should be possible to use different strategy in 
different threads.
  - Immutable heap can be collected 100% concurrently without any 
synchronization with the program.
  - Shared heap is the only one that require disturbing the whole 
program, but as a matter of good practice, this heap should be 
small anyway.

Various ML family languages (like OCaml) have adopted segregated 
heap strategy and get great benefice out of it. For instance, 
OCaml's GC is known to outperform Java's in most scenarios.

We are sitting on a huge GC goldmine here, but 3 things prevent 
us to exploit it:
  - Exceptions. They can bubble from one thread to the other and 
create implicit sharing.
  - Uniqueness (as it is defined now) as it allow for unique 
object to be merged with any heap.
  - message passing. Ownership transfert is not possible and so 
unsafe casting ensue.

  * It has to be noted that delegate allow as well for this kind 
of stunt, but this is recognized as a bug by now and hopefully it 
is gonna be fixed.

D has a type qualifier system for which we pay a big price. 
Getting everything const correct is difficult. We'd want to get 
the most bang for the buck. One of the bang we are not far to be 
able to get is segregating the heap. That mean shitty GC and 
unsafe code.

Let's present a concrete exemple using ownership:
pure Object foo() { ... }
immutable o = foo();

This is valid code. However, foo can do arbitrary manipulation to 
come up with the object. These include various allocations. These 
allocation are mutable into foo, which makes it impossible to 
allocate them on the immutable heap (as a GC relying on this 
immutability could mess up things pretty bad). They also cannot 
be allocated on the TL heap as once promoted to immutable, the 
data become shared as well.

On the other hand, ownership means that the compiler can know 
when things go out of scope and free them explicitly. Which is a 
plus as generating less garbage is always a way to improve 
garbage collection. The most efficient work there is is the one 
that do not need to be done.

I'd argue for the introduction of a basic ownership system. 
Something much simpler than rust's, that do not cover all uses 
cases. But the good thing is that we can fallback on GC or unsafe 
code when the system show its limits. That mean we rely less on 
the GC, while being able to provide a better GC.

We already pay a cost at interface with type qualifier, let's 
make the best of it ! I'm proposing to introduce a new type 
qualifier for owned data.

Now it means that throw statement expect a owned(Throwable), that 
pure function that currently return an implicitly unique object 
will return owned(Object) and that message passing will accept to 
pass around owned stuff.

The GC heap can be segregated into island. We currently have 3 
types of islands : Thread local, shared and immutable. These are 
builtin island with special characteristics in the language. The 
new qualifier introduce a new type of island, the owned island.

owned island can only refers to other owned island and to 
immutable. they can be merged in any other island at any time 
(that is why they can't refers to TL or shared).

owned(T) can be passed around as function parameter or returned, 
or stored as fields. When doing so they are consumed. When an 
owned is not consumed and goes out of scope, the whole island is 
freed.

That means that owned(T) can implicitly decay into T, 
immutable(T), shared(T) at any time. When doing so, a call to the 
runtime is done to merge the owned island to the corresponding 
island. It is passed around as owned, then the ownership is 
transferred and all local references to the island are 
invalidated (using them is an error).

On an implementation level, a call to a pure function that return 
an owned could look like this :

{
   IslandID __saved = gc_switch_new_island();
   scope(exit) gc_restore_island(__saved);

   call_pure_function();
}

This allow us to rely much less on the GC and allow for a better 
GC implementation.

@nogc . Remember ? It was in the title. What does a @nogc 
function look like ? a no gc function o not produce any garbage 
or trigger the collection cycle. there is no reason per se to 
prevent the @nogc code to allocate on the GC as long as you know 
it won't produce garbage. That mean the only operation you need 
to ban are the one that merge the owned things into TL, shared or 
immutable heap.

This solves the problem of the @nogc + Exception. As Exception 
are isolated, they can be allocated, throw and catched into @nogc 
code without generating garbage. They can safely bubble out of 
the @nogc section of the code and still be safe.

The same way, it open the door for a LOT of code that is not 
@nogc to be. If the code allocate memory in an owned island and 
return it, then it is now up to the caller to decide whether is 
want's it garbage collected or keep it as owned (and/or make it 
reference counted for instance).

The solution of passing a policy at compile for allocation is 
close to what C++'s stdlib is doing, and even if the proposed 
approach by Andrei is better, I don't think this is a good one. 
The proposed approach allow for a lot of code to be marked as 
@nogc and allow for the caller to decide. That is ultimately what 
we want libraries to look like.