How does D compare to Go when it comes to C interop?

Thu Dec 10 21:05:29 PST 2015

On Fri, 11 Dec 2015 01:25:24 +0000, Ola Fosheim Gr wrote:

> it does not work for 64 bit either. You will:
>
> 1. Kill caches and TLB

Which only affects efficiency, not correctness. And that's only for 
people who want to use as much as a gigabyte of stack space for every 
fiber.

Since the TLB is a cache based on usage, that part only applies when you 
are using pointers to tons of far-flung regions of memory all together. 
If that's your usage pattern, it doesn't matter whether you're on a 
native thread stack or you're storing things on the heap or you're using 
a memory-mapped fiber stack; you're going to have a bad time.

The only thing you can do in application code to improve the situation is 
to use larger pages. For instance, you can pass MAP_HUGETLB to mmap(2), 
which should enable 2-4MB pages in place of 4-8KB ones (depending on 
hardware support and kernel configuration).

Finally, the *best* you can do for giving hints to the OS about the 
intended access pattern with lazy physical memory allocation is mmap, 
possibly passing MAP_GROWSDOWN, which is pretty much intended for this 
sort of use case.

For caching, the major problem we see here is that the physical memory 
behind a very large stack is not contiguous. The only thing you can do to 
improve your situation is to request larger pages, as with MAP_HUGETLB.

> 2. Get bloated page tables.

Also affecting efficiency rather than correctness. Also not in the common 
case. Also potentially improved with MAP_HUGETLB, depending on OS 
internals. (The kernel might sometimes have a different store of large 
pages, or sometimes it may stitch together multiple adjacent normal 
pages.)

> 3. Run out of memory.
> 4. Run out of stack space.

At some point, you have to write code against the system you're using, 
not some idealized computer with infinite resources.

You can use very large stacks in your fibers, but you need to ensure that 
they're short-lived. You can operate recursively on moderately large 
datasets, but not on arbitrarily large ones. These are considerations you 
have to take into account in D.

You have to consider the same things in Go because memory is a limited 
resource. Sometimes you can address them in different ways.

> These 3 approaches work:
> 
> 1. Allocate all activation records on the heap (Simula/Beta)

Or rather, allow a fragmented stack, in both physical and virtual memory. 
Don't even bother giving the kernel any hints about probable access 
patterns. This has an obvious negative impact on performance, and that 
applies to the common case as well as unusual ones.

> 2. Use stacks that grows and shrink dynamically (Go)

This has most of the problems you complain about. (Go doesn't even have 
unlimited stack sizes; see https://golang.org/pkg/runtime/debug/
#SetMaxStack .)

Furthermore, Go's implementation (based on comments in the source code) 
requires that every function check that the stack is large enough. Every 
single function call. Even if that were a good solution, it's not going 
to be added to D.

In order to keep the stack contiguous, Go *reallocates and copies your 
entire stack*, then walks through it to fix up every pointer. This is 
only even possible because you can't store stack pointers on the heap, 
apparently. (I don't know how that's enforced.)

This is, needless to say, expensive. But at least it's amortized, right?

If you want to make this work from D, you would have to do something a 
bit more awkward. Maybe create a shared memory object, then mmap it 
multiple times at different sizes. Pass it back and forth between two 
ranges of virtual memory. It would be ugly, and when you're unlucky, you 
won't have enough virtual address space in the right places.

> 3. Require no state on stack at yield. (Pony / C++17)

Which limits their utility immensely.