On heap segregation, GC optimization and @nogc relaxing

Tue Nov 11 19:13:15 PST 2014

On 12/11/2014 3:34 p.m., deadalnix wrote:
> Hi all,
>
> I want to get back on the subject of ownership, lifetime and propose
> some solution, but before, propose to state the problem in a way that
> haven't seen before (even if I have no doubt some have came to the same
> conclusion in the past).
>
> The problem at hand is double: memory management and thread safety.
> Number one has been a hot topic for ages, and number 2 has become very
> over the past years, to the widespreading of multicores CPU.
>
> The problem at hand here is ownership of data. There are 3 roads you can
> go about it:
>   - immutability and GC. Effectively, these 2 technique allow you to get
> rid of ownership. There are advantages and drawbacks i'm going to
> discuss later.
>   - Being unsafe and rely on convention. This is the C++ road (and a
> possible road in D). It allow to implement almost any wanted scheme, but
> come at great cost for the developer.
>   - Annotations. This is the Rust road. It also come a great cost for
> the developer, as some schemes may be non trivial to express granted the
> type system, but, contrary to the C++ road, is safe.
>
> These approach all have some very nice things going on for them, but
> also some killer scenarios.
>
> Immutability+GC allow to have safety while keeping interfaces simple.
> That is of great value. It also come with some nice goodies, in the
> sense that is it easy and safe to shared data without bookkeeping,
> allowing one to fit more in cache, and reduce the amount of garbage
> created. Most text processing apps fall into this category and this is
> why D is that good at them. Another big goodies is that many lock free
> algorithm become possible. Once you remove the need for bookkeeping of
> ownership many operations can be implemented in an atomic manner.
> Additionally, it is possible to implement various GC optimization on
> immutable heap, which make the GC generally more efficient. But the cost
> is also real. For some use case, this mean having a large amount of
> garbage generated (Carmack wrote a piece on haskell were he mention the
> disastrous effect that having a framebuffer immutable would have: you'd
> have to clone it everytime you draw in it, which is a no go). GC also
> tend to cause unpredictable runtime characteristics, which programs with
> real time constraint can have hard time to deal with.
>
> Relying on convention has the advantage that any scheme can be
> implemented without constraint, while keeping interface simple. The
> obvious drawback is that it is time consuming and error prone. It also
> make a lot of things unclear, and dev choose the better safe than sorry
> road. That mean excessive copying to make sure one own the data, which
> is wasteful (in term of work for the copy itself, garbage generation and
> cache pressure). If this must be an option locally for system code, it
> doesn't seems like this is the right option at program scale and we do
> it in C++ simply because we have to.
>
> Finally, annotations are a great way to combine safety and speed, but
> generally come at a great cost when implenting uncommon ownership
> strategies where you ends up having to express complex lifetime and
> ownership relations.
>
> Ideally, we want to map with what the hardware does. So what does the
> hardware do ?
>
> Multicore CPU have various cores, each of them having layers of cache.
> Cache is organized in cache line and each cache line can be in various
> modes. Actual system are quite complex and deal with problems we are not
> very interesting here (like writeback) but the general idea is that
> every cache line is owned with different modes.
>
> Either the cache line is owned by a single core and can be written to,
> or the cache line shared by several cores, each of them having a local
> copy of the line, but none of them can write to. There is an internal
> bus where cores can exchange cache line with each other and messages to
> acquire cache line in read or read/write mode. That mean CPU are good at
> thread local read/write, shared immutable and transfer of ownership from
> one core to the other. They are bad at shared writable data (as
> effectively, the cache line will have to bounce back and forth between
> cores, and all memory access will need to be serialized instead of
> performed out of order).
>
> In that world, D has a bizaro position were it use a combination of
> annotations (immutable, shared) and GC. Ultimately, this is a good
> solution. Using annotation for common cases, fallback on GC/unsafe code
> when these annotations fall short.
>
> Before going into why it is fallign short, a digression on GC and the
> benefits of segregating the heap. In D, the heap is almost segregated in
> 3 groups: thread local, shared and immutable. These group are very
> interesting for the GC:
>   - Thread local heap can be collected while disturbing only one thread.
> It should be possible to use different strategy in different threads.
>   - Immutable heap can be collected 100% concurrently without any
> synchronization with the program.
>   - Shared heap is the only one that require disturbing the whole
> program, but as a matter of good practice, this heap should be small
> anyway.
>
> Various ML family languages (like OCaml) have adopted segregated heap
> strategy and get great benefice out of it. For instance, OCaml's GC is
> known to outperform Java's in most scenarios.
>
> We are sitting on a huge GC goldmine here, but 3 things prevent us to
> exploit it:
>   - Exceptions. They can bubble from one thread to the other and create
> implicit sharing.
>   - Uniqueness (as it is defined now) as it allow for unique object to
> be merged with any heap.
>   - message passing. Ownership transfert is not possible and so unsafe
> casting ensue.
>
>   * It has to be noted that delegate allow as well for this kind of
> stunt, but this is recognized as a bug by now and hopefully it is gonna
> be fixed.
>
> D has a type qualifier system for which we pay a big price. Getting
> everything const correct is difficult. We'd want to get the most bang
> for the buck. One of the bang we are not far to be able to get is
> segregating the heap. That mean shitty GC and unsafe code.
>
> Let's present a concrete exemple using ownership:
> pure Object foo() { ... }
> immutable o = foo();
>
> This is valid code. However, foo can do arbitrary manipulation to come
> up with the object. These include various allocations. These allocation
> are mutable into foo, which makes it impossible to allocate them on the
> immutable heap (as a GC relying on this immutability could mess up
> things pretty bad). They also cannot be allocated on the TL heap as once
> promoted to immutable, the data become shared as well.
>
> On the other hand, ownership means that the compiler can know when
> things go out of scope and free them explicitly. Which is a plus as
> generating less garbage is always a way to improve garbage collection.
> The most efficient work there is is the one that do not need to be done.
>
> I'd argue for the introduction of a basic ownership system. Something
> much simpler than rust's, that do not cover all uses cases. But the good
> thing is that we can fallback on GC or unsafe code when the system show
> its limits. That mean we rely less on the GC, while being able to
> provide a better GC.
>
> We already pay a cost at interface with type qualifier, let's make the
> best of it ! I'm proposing to introduce a new type qualifier for owned
> data.
>
> Now it means that throw statement expect a owned(Throwable), that pure
> function that currently return an implicitly unique object will return
> owned(Object) and that message passing will accept to pass around owned
> stuff.
>
> The GC heap can be segregated into island. We currently have 3 types of
> islands : Thread local, shared and immutable. These are builtin island
> with special characteristics in the language. The new qualifier
> introduce a new type of island, the owned island.
>
> owned island can only refers to other owned island and to immutable.
> they can be merged in any other island at any time (that is why they
> can't refers to TL or shared).
>
> owned(T) can be passed around as function parameter or returned, or
> stored as fields. When doing so they are consumed. When an owned is not
> consumed and goes out of scope, the whole island is freed.
>
> That means that owned(T) can implicitly decay into T, immutable(T),
> shared(T) at any time. When doing so, a call to the runtime is done to
> merge the owned island to the corresponding island. It is passed around
> as owned, then the ownership is transferred and all local references to
> the island are invalidated (using them is an error).
>
> On an implementation level, a call to a pure function that return an
> owned could look like this :
>
> {
>    IslandID __saved = gc_switch_new_island();
>    scope(exit) gc_restore_island(__saved);
>
>    call_pure_function();
> }
>
> This allow us to rely much less on the GC and allow for a better GC
> implementation.
>
> @nogc . Remember ? It was in the title. What does a @nogc function look
> like ? a no gc function o not produce any garbage or trigger the
> collection cycle. there is no reason per se to prevent the @nogc code to
> allocate on the GC as long as you know it won't produce garbage. That
> mean the only operation you need to ban are the one that merge the owned
> things into TL, shared or immutable heap.
>
> This solves the problem of the @nogc + Exception. As Exception are
> isolated, they can be allocated, throw and catched into @nogc code
> without generating garbage. They can safely bubble out of the @nogc
> section of the code and still be safe.
>
> The same way, it open the door for a LOT of code that is not @nogc to
> be. If the code allocate memory in an owned island and return it, then
> it is now up to the caller to decide whether is want's it garbage
> collected or keep it as owned (and/or make it reference counted for
> instance).
>
> The solution of passing a policy at compile for allocation is close to
> what C++'s stdlib is doing, and even if the proposed approach by Andrei
> is better, I don't think this is a good one. The proposed approach allow
> for a lot of code to be marked as @nogc and allow for the caller to
> decide. That is ultimately what we want libraries to look like.

Humm.

import std.stdio;

struct TypeOwnerShip(T) {
	T value;
	alias value this;

	this(T)(T value) {
		this.value = value;	
	}

	// implict casts to immutable, shared?
	// on cast to immutable, shared change islands
}

T owned(T)(T value) {
	return TypeOwnerShip!T(value);	
}

class Bar {
	int x;

	this(int x) pure {
		this.x = x;	
	}
}

Bar foo() pure {
	return owned(new Bar(5));
}

struct IslandAllocationStrategy {
	this(ubyte v = 0) {
	}

	void opWithIn() {
		writeln("opWithIn");
		// thread local overriding
	}

	void opWithOut() {
		import std.stdio;
		writeln("opWithOut");
		// reset thread local overriding
	}
}

@property IslandAllocationStrategy island() {
	return IslandAllocationStrategy();
}

void main() {
	writeln("start");
	with(island) {
		opWithIn;
		writeln("{");

		Bar myValue = foo();
		writeln(myValue.x);

		writeln("}");
		opWithOut;
	}
	writeln("end");
}

I feel like I've suggested this, just without the CS theory.