The non allocating D subset

Sat Jun 1 07:40:09 PDT 2013

On Saturday, 1 June 2013 at 05:45:38 UTC, SomeDude wrote:
> Basically it is a non allocating D subset.

Not necessarily nonallocating, but it doesn't use a gc. I just 
updated the zip:

http://arsdnet.net/dcode/minimal.zip

If you make the program (only works on linux btw) and run it, 
you'll see a bunch of allocations fly by, but they are all done 
with malloc/free and their locations are pretty predictable.

The file minimal.d is a test program and you can see a lot of D 
works, including features like classes, exceptions, templates, 
structs, and delegates. (Heap allocated delegates should be 
banned but aren't, so if you do one built in it will leak. The 
helper file, memory.d, though contains a HeapDelegate struct that 
refcounts and frees it, so the concept is still usable.)

The other cool thing is since the library is so minimal, the 
generated executable is small too. Only about 30 KB with an empty 
main(), and no outside dependencies. A cool fact about that is 
you can compile it and run on bare metal (given a bootloader like 
grub) too, and it all just works.

You can also make "LIBC=yes" and depend on the C library, which 
makes things work better - there's a real malloc function there! 
- and adds about 10kb to the executable. That's probably a more 
realistic way to use it on the desktop at least than totally 
standalone.

But yeah, I haven't written any real code with this, but so far 
it seems to be pretty usable.

I also talked a while on the reddit thread last night about this, 
so let me copy/paste that here too:

Yes, certainly. And it wouldn't even necessarily be no array 
concats, just you wouldn't want to use the built-in ones.

Some features that use the gc in the real druntime don't 
necessarily have to. You'll need to be aware of this most the 
time to free the memory in your app, but you can have a pretty 
good idea of when it will happen. One example is new class. If 
that mallocs, if you just match every new with a delete (or call 
to free_obj() or whatever), you'll be fine, just like C++.

I played with one I wasn't sure would work earlier, but now think 
it can: heap closures. scope delegates are easy, since they don't 
allocate, but heap closures allocate automatically and don't give 
much indication that they do.... but, if you are careful with it, 
the rules can be followed (if it accesses an outside scope and 
has its address taken/reference copied or passed to a function, 
it will automatically allocate), and you can manually call 
free(dg.ptr); when you're done with it.

I think it is probably safer to just disallow them, either by not 
implementing _d_allocmemory in druntime (thus if you accidentally 
use it, you'll get a linker error about the missing function), 
or, and this is tricky right now but not actually impossible, use 
compile time reflection to scan your methods and members for a 
non-scope delegate reference and throw an error.

If we do the latter, a heap delegate can actually be allowed in a 
fairly safe way, by wrapping it in a struct. Usage of 
HeapDelegate!T will be pretty obvious, so you aren't going to 
accidentally use it the way you might the more sugary built in. I 
have a proof of concept implemented in my local copy of minimal.d 
that automatically refcounts the delegate, freeing it when the 
last reference goes out of scope.

Array slices are ok the way I have them implemented now: the 
built in concat function is missing, so if you try a ~ b, it will 
be a linker error (including the source file and line number btw, 
easy enough to handle). No allocation there. The biggest risk is 
lifecycle management, and the rule there is you don't own slices 
(non-immutable ones at least). I'd like the compiler to implement 
a check on this, but right now it doesn't. Not a hard coding 
convention though.

Built in new array[] is not implemented, meaning it is a linker 
error, because they are indistinguishable from slices type-wise. 
(In theory it could be like classes, where you just know to 
manually free them, but if you have a char[] member, are you sure 
that was new'd or is it holding a slice someone else owns? Let's 
just avoid it entirely.)

But, this doesn't mean we can't have some of D's array 
convenience! In minimal.d, you can see a StackArray!T struct and 
maybe, not sure if I put it in that zip or not, a HeapArray!T 
struct. These types own their memory, stack of course going away 
with scope, and heap being automatically reference counted to 
call free() when all copies are gone, and overload a few 
operators for convenience:

alias this is to a slice function, so you can do char[] slice = 
myCustomArray; You can't change the original pointer through that 
slice, so no risk of it losing the memory.

The ~= operator is implemented too on the *Array containers. They 
know their length and their capacities, and you can append up to 
the capacity. (The HeapArray could also realloc() as needed, but 
right now I don't.) One important difference though with this and 
regular D arrays is in regular D:

string a = "hey"; string b = a; b ~= " man";
assert(a == "hey" && b == "hey man");

Appending to the second one doesn't change the first one. It may 
allocate as needed (see this for details: 
http://dlang.org/d-array-article.html )

Whereas with a HeapArray or StackArray, they share the same 
underlying data, so appending to one reference would append to 
all. I think that's OK though, because we have helper things like 
const to avoid that, and they are a custom type, so they are 
allowed to work differently than the built ins.

Thankfully, btw, static arrays are a different type. They can be 
permitted with ease.

I didn't implement a ~ b. I think that one would be too easy to 
lose and either pointlessly malloc/free in the middle of an 
expression, or just forget to free entirely, but maybe it could 
be done too.

Another issue is strings. In phobos and druntime both, there's a 
lot or creating strings on the heap and returning them from 
functions. e.g. to!string(10) returns a brand new allocated 
string "10". We don't want that in our library, so it looks a 
little more like C, but slices make it easier to manage. The 
analogous function I wrote is

char[] intToString(int a, char[] buffer)

You pass it an area of pre-allocated buffer to write to. 
buffer.length tells it where it isn't allowed to continue (unlike 
a plain char* in C). It returns the slice of your own buffer that 
was actually used. So writeln(10) becomes:

char[16] buffer;
write(intToString(10, buffer), "\n");

a little more verbose, but there's no mystery there about the 
memory. intToString knows it only has 16 spaces to work with, you 
know exactly where it is going, no allocations, and the return 
value conveniently has the length used too, so we can pass it 
directly to another function. (as long as that function doesn't 
store the reference!)

Built in AAs? Not implemented. But we could do a library AA just 
as easily, and thanks to overloaded operators, it would be pretty 
too.

Another issue is exceptions. They work, and must be classes. So 
where do you free them? I haven't tried this yet, but I'm pretty 
sure you can just do it when you catch() it, and no problem.

Well this is turning into a real beast of a comment, so let me 
sum up and finish: a lot of D can work without the GC. It will 
take some custom types and deliberately missing druntime 
functions to make pretty, but it leaves us with a language at 
least as usable as C++, with the same idea of no surprise/hidden 
allocations in there. There's still a question of having stuff we 
don't necessarily want like RTTI, but their impact can be 
minimized so I think it will be ok too. (Oh btw, since I have a 
custom druntime here, I added some runtime reflection the real D 
doesn't have yet. It rox, and came cheap, but you can still 
version it out.)