The Big Picture
via Digitalmars-d
digitalmars-d at puremagic.com
Tue Feb 24 13:37:06 PST 2015
Several of us (deadalnix, Zach the Mystic, myself, probably
others) have been putting forward some ideas related to memory
management that involve ownership. A recent discussion made me
realize that, while there were more or less concrete proposals
for specific parts, there was no explanation of how they're all
supposed to work together. I believe this may have led to
significant misunderstandings.
In the following, I want to summarize my understanding of "The
Big Picture" (which is probably not far from deadalnix's and
Zach's ideas). Please note that this is not a proposal, even the
exact semantics of the described concepts are not really
important for this post.
Problem Statement
=================
There are many different strategies for managing resources like
memory, file handles, OpenGL objects, etc. The most important
ones are:
a) manual management (e.g. new/delete, malloc()/free(),
open()/close())
b) unique/owned wrappers: there is only one reference to the
resource; when that reference goes out of scope, the resource can
be released
c) reference counting: there can be more than one reference; when
the last one is dropped, the resource is released
d) (tracing) garbage collection: this is mostly used for memory
resources
D already provides several mechanisms to implement these
strategies, but relies a lot on garbage collection. To improve
that situation (which is part of the "Vision" for the near future
[1]), we'd like to provide ways to use other strategies in a SAFE
and EFFICIENT manner.
Requirements
============
It's clear that we have to work with the existing language
(although still allowing some changes to it, ideally non-breaking
ones), and that we as a project have only limited resources.
Therefore, any demands we want to make against a possible
solution must be weighed against the costs of satisfying them.
However, it is a good idea to start with requirements from an
ideal solution, and see what can realistically be implemented.
Here's my "wishlist":
1) Compatibility
There's a huge amount of existing code. An ideal solution should
not only not break existing code, but also allow existing code to
take advantage of the new features with as little change as
possible.
2) Safety/Correctness
The compiler must statically disallow uses of a resource that are
unsafe or incorrect for the chosen management strategy. Ideally,
this applies not only to @safe-ty, but also to other kinds of
correctness, like preventing access to a closed file handle.
3) Efficiency
Lack of performance is probably the most frequent reason for
avoiding the GC. Therefore, our "dream solution" should not
introduce unnecessary performance penalties itself. Just like
template functions are expected to perform as well as
hand-written specialized code, an RC wrapper, for example, should
perform as well as hand-written (but tedious and potentially
unsafe) manual reference counting.
4) Implementable in a Library
The language should provide the tools necessary to implement as
much as possible in the standard library or in application code.
5) Composability
Most code, especially in libraries, shouldn't have to care about
the underlying resource management strategy of the data it
processes, nor should it impose a particular strategy on the
user. Resource management strategy should be the responsibility
of the client code. This principle should be followed to the
greatest possible extent. Especially in light of point 4), this
will make it possible to use user-defined RC implementations
together with the standard library and other libraries.
6) Additional uses
A good feature is applicable outside of the use case it was
introduced for. This is all the more important, the more
fundamental a change to the language is, so that it can pull its
own weight.
Proposed Solution
=================
Most resource management problems are best described in terms of
ownership. Therefore, it is natural to take the solution from the
vast amount of research and practical experimentation that has
been done in this field. Two things are proposed:
A) A way to limit the lifetime of resource handles (mostly
references/pointers, but could be other things like file handles)
to a particular lexical scope (the `scope` keyword is already
designated for that purpose), as well as providing a
compiler-checkable escape hatch (`scope!identifier` in my
suggestion [2], `return ref` in DIP25).
B) A way to bind ownership of a resource to a variable and ensure
that this variable is the only (non-ephemeral) handle/reference
to that resource. The uniqueness property can be exploited to
provide many interesting guarantees.
The details of implementation and exact semantics of these two
features are not important for the big picture. Let's call them
SCOPE and UNIQUE from now on.
Evaluation
==========
Let's see how we fare:
1) Compatibility
Both new features are add-ons to the language, and are therefore
opt-in. They won't affect existing programs at all. On the other
hand, existing code cannot directly profit from them, but it
needs only small modifications to enable that, see 5).
2) Safety/Correctness
The features will behave in such a way that guarantees that no
references/handles to a resource are left over when the resource
is destroyed (SCOPE). Moreover, UNIQUE will have features that
allows for safe use in the situations exemplified below. (The
details to this are out of scope here, but it has indeed been
proven possible.)
3) Efficiency
SCOPE and UNIQUE objects by themselves have a memory layout
identical to there "normal" counterparts. There are no inherent
runtime penalties for them. In fact, they allow for certain
performance optimizations.
Let's take reference counting as an example: an RC!T wrapper can
decay to SCOPE(T), which enables it to elide refcount
manipulation entirely, and - depending on the proposal - to
SCOPE(RC!T), which allows it to stay copyable, but defer
adjusting of the refcount to the point where the actual copying
is done.
UNIQUE allows conversion to immutable and shared without calling
`idup` in many cases.
4) Implementable in a Library
The language will provide a minimal, but sufficient
implementation of SCOPE and UNIQUE. This can then be used as a
building block to implement other things in user code.
5) Composability
As written above, only client code should decide about management
strategy. Library code comes in two flavors: it can be a consumer
of data (probably most cases), or it can be a producer.
Most consumers will only temporarily "look at" data they receive
from client code, maybe make changes to it, but never keep
reference to it around after they return. This is exactly what
SCOPE guarantees. Such consumers need to take all their data by
SCOPE. UNIQUE(T) is implicitly convertible to SCOPE(T), and
user-defined types can make themselves decay to SCOPE(T) using
`alias this` or accessor methods. All kinds of data can then be
passed to them, no matter whether it is refcounted, GC managed, a
stack variable, a global variable, or manually managed.
Then there's the producers. These are things like `toString()`
and `std.stdio.File`, which allocate memory or other resources
and return them to client code. They shouldn't need to care what
the client code wants to do with them. UNIQUE enables that: A
UNIQUE(T) can be consumed by moving it into an RC!T, a T (which
means the GC will manage it from then on), any other user defined
type, or another UNIQUE(T). It can also be left to go out of
scope, in which case it will be released automatically. Before
that, it can also be passed as a SCOPE parameter or stored in a
SCOPE variable.
(Allocators will play an important role here in practice, but
they are actually a different topic: memory allocation != memory
management.)
Without these two features, consumers would need to be
specialized for the various types (=> template bloat or manual
work), and producers would either need to decide on one return
type or make it configurable (=> template bloat).
6) Additional uses
UNIQUE in particular has interesting additional uses. @nogc
exceptions are one example, safe message passing (transfer of
entire graphs of objects to other threads) is another. Some
variants of ownership also provide ways to prevent iterator
invalidation.
[1] http://wiki.dlang.org/Vision/2015H1
[2] http://wiki.dlang.org/User:Schuetzm/scope
[3] http://wiki.dlang.org/DIP25
More information about the Digitalmars-d
mailing list