How does D compare to Go when it comes to C interop?

Fri Dec 11 01:46:27 PST 2015

On Friday, 11 December 2015 at 05:05:29 UTC, Chris Wright wrote:
>> 1. Kill caches and TLB
>
> Which only affects efficiency, not correctness.

That's true, but when you have fibers or coroutines as a 
paradigm, you do it because it is a convenient way of preserving 
statefulness. So you want to be able to use it where it makes 
your code more maintainable. If you only care about efficiency 
for a very limited scenario, then you don't pick coroutines, you 
use events. But having many, also short lived coroutines makes it 
easier to write code that is evolving, like simulations. That 
makes requiring syscalls like mmap for instantiation way too 
expensive, although if all your stacks are the same size, you can 
just use a freelist pool, and avoid the syscalls. But that is not 
a good generic solution. That's a special case.

> memory all together. If that's your usage pattern, it doesn't 
> matter whether you're on a native thread stack or you're 
> storing things on the heap or you're using a memory-mapped 
> fiber stack; you're going to have a bad time.

It matters if you keep wiping caches/tlb by hammering the page 
tables with changes, it matters if you need to use small pages 
because you need to use a guard page at the bottom of the stack 
in order to avoid checking stack size. It matters if page tables 
grow in size because you fragment memory deliberately. And you 
also need to make sure that code probe the guard page before 
addressing something beyond a potential guard page etc (e.g. if 
you put a large array on the stack).

> You have to consider the same things in Go because memory is a 
> limited resource. Sometimes you can address them in different 
> ways.

Yes, it is a limited resource, especially in typical Go scenarios 
where you run on shared instances with a fixed small memory size. 
Which basically makes small default stacks that grow a decent 
solution, although it does make GC questionable as it leads to 
significant memory overhead.

>> 1. Allocate all activation records on the heap (Simula/Beta)
>
> Or rather, allow a fragmented stack, in both physical and 
> virtual memory. Don't even bother giving the kernel any hints 
> about probable access patterns. This has an obvious negative 
> impact on performance, and that applies to the common case as 
> well as unusual ones.

This is basically the model most high level languages take on the 
conceptual level, then you do optimizations under the hood. 
Basically having the same model for objects, functions, lambdas 
and coroutines is a big win in many ways. You can still have a 
LIFO allocator under the hood for "stack-like-allocation".

New features in C++ is taking the everything-is-an-object 
approach. Lambdas are objects. Coroutines are objects.

Is it more difficult to get the highest performance, yes, but it 
is memory efficient and conceptually elegant.

> In order to keep the stack contiguous, Go *reallocates and 
> copies your entire stack*, then walks through it to fix up

Yes, but one can easily think of optimizations, e.g. leave open 
slots so that you statistically often can just extend the stack. 
Or the opposite, over-allocate and shrink when you know what the 
stack will be like. One problem with Go there could be the focus 
on separate compilation, smart behaviour here probably require 
full analysis of possible call-chains.

> If you want to make this work from D, you would have to do 
> something a bit more awkward.

One general problem for D is that you can call D from C. If you 
knew that C code only could be called at the leaves (or rather 
keep state at the bottom of the stack), then you also would get 
more creative freedom.

>> 3. Require no state on stack at yield. (Pony / C++17)
>
> Which limits their utility immensely.

Not really, it may affect execution speed, but the basic idea is 
that you establish what state is to be retained in the heap 
object by static analysis, the rest is put on the regular thread 
stack.