Returning variable-sized stack data
IchorDev
zxinsworld at gmail.com
Mon Jul 15 10:34:15 UTC 2024
I mentioned in the monthly meeting how I would like to see a more
convenient way to return variable-sized data to the stack in D.
Walter mentioned that he wouldn't like to break the C ABI, which
is understandable, but you can certainly make this work without a
different ABI. In fact, you can even return variable-sized data
to the stack in C:
```c
#include <alloca.h>
#include <stdio.h>
struct A{ int a,b,c,d,e,f,g; };
struct B{ int a,b,c; };
int myFnRetSize(int n){ return n == 0 ? sizeof(struct A) :
sizeof(struct B); }
void myFn(void* mem, int n){
if(n == 0){
struct A a = {1,2,3,4,5,6,7};
*((struct A*)mem) = a;
}else{
struct B b = {1,2,7};
*((struct B*)mem) = b;
}
}
int main(){
int n = 1; //<—— can be any number
int size = myFnRetSize(n);
void* mem = alloca(size);
myFn(mem, n);
//write out the result:
for(int i=0; i<size/sizeof(int); i++){
printf("%d ", ((int*)mem)[i]);
}
printf("\n");
return 0;
}
```
Sorry if the code is terrible, but hopefully it demonstrates my
point adequately. You might say that this is technically
returning by reference, but at the machine-code level all stack
access is done via pointers.
You might be wondering: what the point of having this feature
would even be?
Well, unions always take as much space as their largest member.
If a union contains a struct that's (for example) 512 bytes
large, it will always take 512 bytes, when really we might only
need to store a 4–8 byte number most of the time. With sumtypes,
variable-sized stack returns could greatly optimise their stack
consumption in cases where they have vastly different type sizes,
with the smaller types being used most frequently.
Some might say reference types should be used for such a purpose;
but when you're programming a largely data-driven system that
uses masses of structs, the sheer amount of heap allocations
could become a huge performance bottleneck, whereas stack
allocation is practically instant. Of course, you can always
pre-allocate a huge amount of data onto the stack, but then a lot
of it will go to waste and your code will be more vulnerable to
stack-overflows.
A way of making variable-sized stack returns less cumbersome in D
would be to have some syntactic sugar that works something like
this:
```d
struct A{ int a,b,c,d,e,f,g; }
struct B{ int a,b,c; }
@stackArrayReturn myFn(int n){
auto nSqr = n * n; //demonstrate how variables can affect
the return value
auto condition = n * n;
if(condition == 0){
return A(nSqr+1,2,3,4,5,6,7);
}else{
return B(nSqr,2,7);
}
}
void main(){
int n = 1; //<—— can be any number
void[] myMem = myFn(n);
}
```
Which gets lowered to this:
```d
import std.typecons;
struct A{ int a,b,c,d,e,f,g; }
struct B{ int a,b,c; }
size_t myFn(out return scope void function(void[] memory,
Tuple!(int,"nSqr") context) callback, out Tuple!(int,"nSqr")
context, int n){
auto nSqr = n * n;
auto condition = n * n;
if(condition == 0){
context = Tuple!(int,"nSqr")(nSqr);
callback = (void[] m, Tuple!(int,"nSqr") ctx){
*cast(A*)&m[0] = A(ctx.nSqr+1,2,3,4,5,6,7);
};
return A.sizeof;
}else{
context = Tuple!(int,"nSqr")(nSqr);
callback = (void[] m, Tuple!(int,"nSqr") ctx){
*cast(B*)&m[0] = B(ctx.nSqr,2,7);
};
return B.sizeof;
}
}
void main(){
int n = 1; //<—— can be any number
void[] myMem;
{
scope void function(void[] memory, Tuple!(int,"nSqr") context)
__returnCallback;
Tuple!(int,"nSqr") __returnContext;
size_t __returnSize = myFn(__returnCallback, __returnContext,
n);
import core.stdc.stdlib: alloca;
myMem = alloca(__returnSize)[0..__returnSize];
__returnCallback(myMem, __returnContext);
}
}
```
This is an example of returning one of two different struct
types, but this could also be used to return slices of any type
(e.g. `int[]`).
Ideally there would be a nice way to do this with scoped
delegates, but `alloca` will re-allocate the data that their
context pointer points to. Another less stack-wasteful (albeit
potentially significantly more CPU-wasteful) method would be to
have two completely separate functions. One that determines the
return size, and another that does all the other logic. However,
this method would either limit the function's structure
significantly, or generate wasteful code.
I would love to hear feedback and suggestions for improvements,
or of other possible implementations for variadic stack returns.
More information about the dip.ideas
mailing list