RFC: moving forward with @nogc Phobos

Tue Sep 30 12:10:17 PDT 2014

Ok, here are my few cents:

On Monday, 29 September 2014 at 10:49:53 UTC, Andrei Alexandrescu 
wrote:
> Back when I've first introduced RCString I hinted that we have 
> a larger strategy in mind. Here it is.
>
> The basic tenet of the approach is to reckon and act on the 
> fact that memory allocation (the subject of allocators) is an 
> entirely distinct topic from memory management, and more 
> generally resource management. This clarifies that it would be 
> wrong to approach alternatives to GC in Phobos by means of 
> allocators. GC is not only an approach to memory allocation, 
> but also an approach to memory management. Reducing it to 
> either one is a mistake. In hindsight this looks rather obvious 
> but it has caused me and many people better than myself a lot 
> of headache.

I would argue that GC is at its core _only_ a memory management 
strategy. It just so happens that the one in D's runtime also 
comes with an allocator, with which it is tightly integrated. In 
theory, a GC can work with any (and multiple) allocators, and you 
could of course also call GC.free() manually, because, as you 
say, management and allocation are entirely distinct topics.

>
> That said allocators are nice to have and use, and I will 
> definitely follow up with std.allocator. However, std.allocator 
> is not the key to a @nogc Phobos.

Agreed.

>
> Nor are ranges. There is an attitude that either output ranges, 
> or input ranges in conjunction with lazy computation, would 
> solve the issue of creating garbage. 
> https://github.com/D-Programming-Language/phobos/pull/2423 is a 
> good illustration of the latter approach: a range would be 
> lazily created by chaining stuff together. A range-based 
> approach would take us further than the allocators, but I see 
> the following issues with it:
>
> (a) the whole approach doesn't stand scrutiny for non-linear 
> outputs, e.g. outputting some sort of associative array or 
> really any composite type quickly becomes tenuous either with 
> an output range (eager) or with exposing an input range (lazy);
>
> (b) makes the style of programming without GC radically 
> different, and much more cumbersome, than programming with GC; 
> as a consequence, programmers who consider changing one 
> approach to another, or implementing an algorithm neutral to 
> it, are looking at a major rewrite;
>
> (c) would make D/@nogc a poor cousin of C++. This is quite out 
> of character; technically, I have long gotten used to seeing 
> most elaborate C++ code like poor emulation of simple D idioms. 
> But C++ has spent years and decades taking to perfection an 
> approach without a tracing garbage collector. A departure from 
> that would need to be superior, and that doesn't seem to be the 
> case with range-based approaches.

I agree with this, too.

>
> ===========
>
> Now that we clarified that these existing attempts are not 
> going to work well, the question remains what does. For Phobos 
> I'm thinking of defining and using three policies:
>
> enum MemoryManagementPolicy { gc, rc, mrc }
> immutable
>     gc = ResourceManagementPolicy.gc,
>     rc = ResourceManagementPolicy.rc,
>     mrc = ResourceManagementPolicy.mrc;
>
> The three policies are:
>
> (a) gc is the classic garbage-collected style of management;
>
> (b) rc is a reference-counted style still backed by the GC, 
> i.e. the GC will still be able to pick up cycles and other 
> kinds of leaks.
>
> (c) mrc is a reference-counted style backed by malloc.
>
> (It should be possible to collapse rc and mrc together and make 
> the distinction dynamically, at runtime. I'm distinguishing 
> them statically here for expository purposes.)
>
> The policy is a template parameter to functions in Phobos (and 
> elsewhere), and informs the functions e.g. what types to 
> return. Consider:
>
> auto setExtension(MemoryManagementPolicy mmp = gc, R1, R2)(R1 
> path, R2 ext)
> if (...)
> {
>     static if (mmp == gc) alias S = string;
>     else alias S = RCString;
>     S result;
>     ...
>     return result;
> }
>
> On the caller side:
>
> auto p1 = setExtension("hello", ".txt"); // fine, use gc
> auto p2 = setExtension!gc("hello", ".txt"); // same
> auto p3 = setExtension!rc("hello", ".txt"); // fine, use rc
>
> So by default it's going to continue being business as usual, 
> but certain functions will allow passing in a (defaulted) 
> policy for memory management.

This, however, I disagree with strongly. For one thing - this has 
already been noted by others - it would make the functions' 
implementation extremely ugly (`static if` hell), it would make 
them harder to unit test, and from a user's point of view, it's 
very tedious and might interfere badly with UFCS.

But more importantly, IMO, it's the wrong thing to do. These 
functions shouldn't know anything about memory management policy 
at all. They allocate, which means they need to know about 
_allocation_ policy, but memory _management_ policy needs to be 
decided by the user.

Now, your suggestion in a way still leaves that decision to the 
user, but does so in a very intrusive way, by passing a template 
flag. This is clearly a violation of the separation of concerns. 
Contrary to the typical case, implementation details of the 
user's code leak into the library code, and not the other way 
round, but that's just as bad.

I'm convinced this isn't necessary. Let's take `setExtension()` 
as an example, standing in for any of a class of similar 
functions. This function allocates memory, returns it, and 
abandons it; it gives up ownership of the memory. The fact that 
the memory has been freshly allocated means that it is (head) 
unique, and therefore the caller (= library user) can take over 
the ownership. This, in turn, means that the caller can decide 
how she wants to manage it.

(I'll try to make a sketch on how this can be implemented in 
another post.)

As a conclusion, I would say that APIs should strive for the 
following principles, in this order:

1. Avoid allocation altogether, for example by laziness (ranges), 
or by accepting sinks.

2. If allocations are necessary (or desirable, to make the API 
more easily usable), try hard to return a unique value (this of 
course needs to be expressed in the return type).

3. If both of the above fails, only then return a GCed pointer, 
or alternatively provide several variants of the function (though 
this shouldn't be necessary often). An interesting alternative: 
Instead of passing a flag directly describing the policy, pass 
the function a type that it should wrap it's return value in.

As for the _allocation_ strategy: It indeed needs to be 
configurable, but here, the same objections against a template 
parameter apply. As the allocator doesn't necessarily need to be 
part of the type, a (thread) global variable can be used to 
specify it. This lends itself well to idioms like

     with(MyAllocator alloc) {
         // ...
     }

>
> Destroy!

Done :-)