The Big Picture

Tue Feb 24 13:37:06 PST 2015

Several of us (deadalnix, Zach the Mystic, myself, probably 
others) have been putting forward some ideas related to memory 
management that involve ownership. A recent discussion made me 
realize that, while there were more or less concrete proposals 
for specific parts, there was no explanation of how they're all 
supposed to work together. I believe this may have led to 
significant misunderstandings.

In the following, I want to summarize my understanding of "The 
Big Picture" (which is probably not far from deadalnix's and 
Zach's ideas). Please note that this is not a proposal, even the 
exact semantics of the described concepts are not really 
important for this post.

Problem Statement
=================

There are many different strategies for managing resources like 
memory, file handles, OpenGL objects, etc. The most important 
ones are:

a) manual management (e.g. new/delete, malloc()/free(), 
open()/close())

b) unique/owned wrappers: there is only one reference to the 
resource; when that reference goes out of scope, the resource can 
be released

c) reference counting: there can be more than one reference; when 
the last one is dropped, the resource is released

d) (tracing) garbage collection: this is mostly used for memory 
resources

D already provides several mechanisms to implement these 
strategies, but relies a lot on garbage collection. To improve 
that situation (which is part of the "Vision" for the near future 
[1]), we'd like to provide ways to use other strategies in a SAFE 
and EFFICIENT manner.

Requirements
============

It's clear that we have to work with the existing language 
(although still allowing some changes to it, ideally non-breaking 
ones), and that we as a project have only limited resources. 
Therefore, any demands we want to make against a possible 
solution must be weighed against the costs of satisfying them.

However, it is a good idea to start with requirements from an 
ideal solution, and see what can realistically be implemented. 
Here's my "wishlist":

1) Compatibility

There's a huge amount of existing code. An ideal solution should 
not only not break existing code, but also allow existing code to 
take advantage of the new features with as little change as 
possible.

2) Safety/Correctness

The compiler must statically disallow uses of a resource that are 
unsafe or incorrect for the chosen management strategy. Ideally, 
this applies not only to @safe-ty, but also to other kinds of 
correctness, like preventing access to a closed file handle.

3) Efficiency

Lack of performance is probably the most frequent reason for 
avoiding the GC. Therefore, our "dream solution" should not 
introduce unnecessary performance penalties itself. Just like 
template functions are expected to perform as well as 
hand-written specialized code, an RC wrapper, for example, should 
perform as well as hand-written (but tedious and potentially 
unsafe) manual reference counting.

4) Implementable in a Library

The language should provide the tools necessary to implement as 
much as possible in the standard library or in application code.

5) Composability

Most code, especially in libraries, shouldn't have to care about 
the underlying resource management strategy of the data it 
processes, nor should it impose a particular strategy on the 
user. Resource management strategy should be the responsibility 
of the client code. This principle should be followed to the 
greatest possible extent. Especially in light of point 4), this 
will make it possible to use user-defined RC implementations 
together with the standard library and other libraries.

6) Additional uses

A good feature is applicable outside of the use case it was 
introduced for. This is all the more important, the more 
fundamental a change to the language is, so that it can pull its 
own weight.

Proposed Solution
=================

Most resource management problems are best described in terms of 
ownership. Therefore, it is natural to take the solution from the 
vast amount of research and practical experimentation that has 
been done in this field. Two things are proposed:

A) A way to limit the lifetime of resource handles (mostly 
references/pointers, but could be other things like file handles) 
to a particular lexical scope (the `scope` keyword is already 
designated for that purpose), as well as providing a 
compiler-checkable escape hatch (`scope!identifier` in my 
suggestion [2], `return ref` in DIP25).

B) A way to bind ownership of a resource to a variable and ensure 
that this variable is the only (non-ephemeral) handle/reference 
to that resource. The uniqueness property can be exploited to 
provide many interesting guarantees.

The details of implementation and exact semantics of these two 
features are not important for the big picture. Let's call them 
SCOPE and UNIQUE from now on.

Evaluation
==========

Let's see how we fare:

1) Compatibility

Both new features are add-ons to the language, and are therefore 
opt-in. They won't affect existing programs at all. On the other 
hand, existing code cannot directly profit from them, but it 
needs only small modifications to enable that, see 5).

2) Safety/Correctness

The features will behave in such a way that guarantees that no 
references/handles to a resource are left over when the resource 
is destroyed (SCOPE). Moreover, UNIQUE will have features that 
allows for safe use in the situations exemplified below. (The 
details to this are out of scope here, but it has indeed been 
proven possible.)

3) Efficiency

SCOPE and UNIQUE objects by themselves have a memory layout 
identical to there "normal" counterparts. There are no inherent 
runtime penalties for them. In fact, they allow for certain 
performance optimizations.

Let's take reference counting as an example: an RC!T wrapper can 
decay to SCOPE(T), which enables it to elide refcount 
manipulation entirely, and - depending on the proposal - to 
SCOPE(RC!T), which allows it to stay copyable, but defer 
adjusting of the refcount to the point where the actual copying 
is done.

UNIQUE allows conversion to immutable and shared without calling 
`idup` in many cases.

4) Implementable in a Library

The language will provide a minimal, but sufficient 
implementation of SCOPE and UNIQUE. This can then be used as a 
building block to implement other things in user code.

5) Composability

As written above, only client code should decide about management 
strategy. Library code comes in two flavors: it can be a consumer 
of data (probably most cases), or it can be a producer.

Most consumers will only temporarily "look at" data they receive 
from client code, maybe make changes to it, but never keep 
reference to it around after they return. This is exactly what 
SCOPE guarantees. Such consumers need to take all their data by 
SCOPE. UNIQUE(T) is implicitly convertible to SCOPE(T), and 
user-defined types can make themselves decay to SCOPE(T) using 
`alias this` or accessor methods. All kinds of data can then be 
passed to them, no matter whether it is refcounted, GC managed, a 
stack variable, a global variable, or manually managed.

Then there's the producers. These are things like `toString()` 
and `std.stdio.File`, which allocate memory or other resources 
and return them to client code. They shouldn't need to care what 
the client code wants to do with them. UNIQUE enables that: A 
UNIQUE(T) can be consumed by moving it into an RC!T, a T (which 
means the GC will manage it from then on), any other user defined 
type, or another UNIQUE(T). It can also be left to go out of 
scope, in which case it will be released automatically. Before 
that, it can also be passed as a SCOPE parameter or stored in a 
SCOPE variable.

(Allocators will play an important role here in practice, but 
they are actually a different topic: memory allocation != memory 
management.)

Without these two features, consumers would need to be 
specialized for the various types (=> template bloat or manual 
work), and producers would either need to decide on one return 
type or make it configurable (=> template bloat).

6) Additional uses

UNIQUE in particular has interesting additional uses. @nogc 
exceptions are one example, safe message passing (transfer of 
entire graphs of objects to other threads) is another. Some 
variants of ownership also provide ways to prevent iterator 
invalidation.

[1] http://wiki.dlang.org/Vision/2015H1
[2] http://wiki.dlang.org/User:Schuetzm/scope
[3] http://wiki.dlang.org/DIP25