On heap segregation, GC optimization and @nogc relaxing

Wed Nov 12 12:36:14 PST 2014

12-Nov-2014 05:34, deadalnix пишет:
> Hi all,
>
> I want to get back on the subject of ownership, lifetime and propose
> some solution, but before, propose to state the problem in a way that
> haven't seen before (even if I have no doubt some have came to the same
> conclusion in the past).

[snip nice summary]
>
> In that world, D has a bizaro position were it use a combination of
> annotations (immutable, shared) and GC. Ultimately, this is a good
> solution. Using annotation for common cases, fallback on GC/unsafe code
> when these annotations fall short.

Aye.

> Before going into why it is fallign short, a digression on GC and the
> benefits of segregating the heap. In D, the heap is almost segregated in
> 3 groups: thread local, shared and immutable. These group are very
> interesting for the GC:
>   - Thread local heap can be collected while disturbing only one thread.
> It should be possible to use different strategy in different threads.
>   - Immutable heap can be collected 100% concurrently without any
> synchronization with the program.
>   - Shared heap is the only one that require disturbing the whole
> program, but as a matter of good practice, this heap should be small
> anyway.
>
> Various ML family languages (like OCaml) have adopted segregated heap
> strategy and get great benefice out of it. For instance, OCaml's GC is
> known to outperform Java's in most scenarios.

+1000
We should take advantage of segregated heap to make all complexity of 
shared/immutable/TL finally pay off.

> I'd argue for the introduction of a basic ownership system. Something
> much simpler than rust's, that do not cover all uses cases. But the good
> thing is that we can fallback on GC or unsafe code when the system show
> its limits. That mean we rely less on the GC, while being able to
> provide a better GC.
>
> We already pay a cost at interface with type qualifier, let's make the
> best of it ! I'm proposing to introduce a new type qualifier for owned
> data.
>
> Now it means that throw statement expect a owned(Throwable), that pure
> function that currently return an implicitly unique object will return
> owned(Object) and that message passing will accept to pass around owned
> stuff.
>
> The GC heap can be segregated into island. We currently have 3 types of
> islands : Thread local, shared and immutable. These are builtin island
> with special characteristics in the language. The new qualifier
> introduce a new type of island, the owned island.
>

Seems sane. owned(Exception) would be implicitly assumed i.e.:
catch(Exception e){ ... }

would be seen by compiler as:
catch(owned(Exception) e){ ... }

What happens if I throw l-value exception? Do I need to cast or 
assumeOwned it?

It's easy to see how it goes with r-values, such as new Exception(...), 
since they are "unique expressions" whatever that means ;)

> owned island can only refers to other owned island and to immutable.
> they can be merged in any other island at any time (that is why they
> can't refers to TL or shared).
>
> owned(T) can be passed around as function parameter or returned, or
> stored as fields. When doing so they are consumed. When an owned is not
> consumed and goes out of scope, the whole island is freed.
>
> That means that owned(T) can implicitly decay into T, immutable(T),
> shared(T) at any time. When doing so, a call to the runtime is done to
> merge the owned island to the corresponding island. It is passed around
> as owned, then the ownership is transferred and all local references to
> the island are invalidated (using them is an error).
>
> On an implementation level, a call to a pure function that return an
> owned could look like this :
>
> {
>    IslandID __saved = gc_switch_new_island();
>    scope(exit) gc_restore_island(__saved);
>
>    call_pure_function();
> }
>
> This allow us to rely much less on the GC and allow for a better GC
> implementation.

I take it that owned(T) is implicitly deduced by compiler in case of 
pure functions? Also it seem templates should not take owned(T) into 
consideration and let it decay... How does owned compose with other 
qualifiers?

>
> @nogc . Remember ? It was in the title. What does a @nogc function look
> like ? a no gc function o not produce any garbage or trigger the
> collection cycle. there is no reason per se to prevent the @nogc code to
> allocate on the GC as long as you know it won't produce garbage. That
> mean the only operation you need to ban are the one that merge the owned
> things into TL, shared or immutable heap.
>
> This solves the problem of the @nogc + Exception. As Exception are
> isolated, they can be allocated, throw and catched into @nogc code
> without generating garbage. They can safely bubble out of the @nogc
> section of the code and still be safe.
>

Seems absolutely cool. But doesn't allocating exception touches heap 
anyway? I take it that if I don't save exception explicitly anywhere the 
owned island is destroyed at catch scope?

> The same way, it open the door for a LOT of code that is not @nogc to
> be. If the code allocate memory in an owned island and return it, then
> it is now up to the caller to decide whether is want's it garbage
> collected or keep it as owned (and/or make it reference counted for
> instance).
>
> The solution of passing a policy at compile for allocation is close to
> what C++'s stdlib is doing, and even if the proposed approach by Andrei
> is better, I don't think this is a good one. The proposed approach allow
> for a lot of code to be marked as @nogc and allow for the caller to
> decide. That is ultimately what we want libraries to look like.

I'm not sure I get all details but I like your proposal MUCH better then 
forcibly introducing ref-counting.

-- 
Dmitry Olshansky