Tasks, actors and garbage collection

Tue Apr 20 16:21:39 UTC 2021

On Tuesday, 20 April 2021 at 09:52:07 UTC, Ola Fosheim Grøstad 
wrote:
> As computer memory grows, naive scan and sweep garbage 
> collection becomes more and more a burden.
>
> Also, languages have not really come up with a satisfactory way 
> to simplify multi-threaded programming, except to split the 
> workload into many single-threaded tasks that are run in 
> parallel.
>
> It seems to me that the obvious way to retain the easy of use 
> that garbage collection provides without impeding performance 
> is to limit the memory to scan, and preferably do the scanning 
> when nobody is using the memory.
>
> The actor model seems to be a good fit. Or call it a task, if 
> you wish. If each actor/task has it's own GC pool then there is 
> less memory to scan, and you can do the scanning when the 
> actor/task is waiting on I/O or scheduling. So you would get 
> less intrusive scanning pauses. It would also fit well with 
> async-await/futures.
>
> Another benefit is that if an actor is deleted before it is 
> scanned, then no scanning is necessary at all. It can simply be 
> released (assuming destructor-free classes are allocated in a 
> separate area). This is of great benefit to web-services, they 
> can simply implement a request-handler as an actor/task.
>
> The downside is that you need a non-GC mechanism for dealing 
> with inter-actor/task communication. Such as reference 
> counting, however that should be quite ok, as you would expect 
> the time-consuming stuff to happen within an actor/task as well 
> as complex allocation patterns.
>
> Is this a direction D is able to move in or is a new language 
> needed?

A few years ago, when [`std.experimental.allocator`][0] was still 
hot out of the oven, I considered that this would one of primary 
innovations that it would enable.

The basic idea is that since allocators are composable 
first-class objects, you can pass them to any function and that 
way you can override and customize its memory allocation policy, 
without resorting to global variables.

(The package does provide convenience [thread-local][1] and 
[global variables][2], but IMO that's an anti-pattern, as if you 
prefer the simplicity, you can either use the GC (as always), or 
`MAllocator` directly. IMO, if you're reaching for 
`std.experimental.allocator`, you do so, in order to gain more 
control over the memory management. Also knowing whether 
`theAllocator` points to `GCAllocator`, or an actually separate 
thread-local allocator, can be critical for ensuring that code is 
lock-free. You either know what you're doing, or the code is not 
performance critical, so it doesn't matter, and you should be 
using the GC anyway.)

By passing the allocator as an object, you allow it to be used 
safely from `pure` functions. (If `pure` functions were to 
somehow be allowed to use those global allocator variables, you 
could have some ugly consequences. For example, a pure function 
can be preempted in the middle of its execution, only to have the 
global allocator replaced under its feet, thereby leaving all the 
memory allocated from the previous allocator dangling.)
Pure code (even in the relaxed D sense) is great for parallelism, 
as a scheduler can essentially assume that it's both lock-free 
and wait-free - it doesn't need to interact with any other 
thread/fiber/task to make progress.

Having multiple per thread/fiber/actor/task GC heaps fits 
naturally in the model you propose. There could be a new 
LocalGCAllocator, which the runtime / framework can simply pass 
to the actor on its creation. There two main challenges:
1. Ensuring code doesn't brake the assumptions of the actor model 
by e.g. sharing memory between threads in an uncontrolled manner. 
This can be addressed in a variety of ways:
     * The framework's build-system can prevent you from importing 
code that doesn't fit its model
     * The framework can run a non-optional linter as part of the 
build process, which would ensure that you don't have:
         * `@system` or `@trusted` code
         * `extern` function declarations (otherwise you could 
define `@safe pure int printf(scope const char* format, scope 
const ...);`)
     * reference capabilities like [Pony][3]'s
     * other type-system or language built-in static analysis
2. Making it ergonomic and easy to use, as is using the GC. 
Essentially having all language and library features that 
currently require the GC use `LocalGCAllocator` automagically.
   I think this can be done in several steps:
     * Finish transitioning druntime's compiler interface from 
unchecked "magic" extern(C) functions to regular D (template) 
functions
     * Add `context` as the last parameter to each of druntime 
function that may need to allocate memory set it's default value 
to the global GC context. This is a pure refactoring, no change 
in behavior.
     * Add Scala `implicit` parameters [⁴][4] [⁵][5] [⁶][6] [⁷][7] 
[⁸][8] to the language and mark the `context` parameters as 
`implicit`

[0]: https://dlang.org/phobos/std_experimental_allocator.html
[1]: 
https://dlang.org/phobos/std_experimental_allocator.html#theAllocator
[2]: 
https://dlang.org/phobos/std_experimental_allocator.html#.processAllocator
[3]: 
https://tutorial.ponylang.io/reference-capabilities/reference-capabilities.html
[4]: 
https://scala-lang.org/files/archive/spec/2.13/07-implicits.html#implicit-parameters
[5]: https://docs.scala-lang.org/tour/implicit-parameters.html
[6]: 
https://docs.scala-lang.org/tutorials/FAQ/finding-implicits.html
[7]: 
https://stackoverflow.com/questions/10375633/understanding-implicit-in-scala
[8]: https://dzone.com/articles/scala-implicits-presentations