Memory/Allocation attributes for variables

Sun May 30 02:18:38 UTC 2021

Hello dear D community,

personally I'm interested in security, memory safety and such 
although I'm not an expert.
I would like to know, if D has memory/allocation attributes (I 
will use both terms interchangeably), or if someone knows a 
library for that which statically asserts that only values 
allocated in compatible allocation regions are contained in 
variables of attributed type. Attributes for memory safety can be 
seen as extension to the type system because these memory 
attributes constrain the address domain of values instead of the 
value domain itself and so they become part of the value domain 
of a pointer or reference.

Now there are A LOT of different allocation regions and scopes 
for variables for whatever purpose:

  - static allocation
  - stack frame
  - dynamic allocation w/o GC
  - dynamic allocation with GC
  - fast allocators optimized for specific objects or even 
function-specific allocators
  - peripheral addresses
  - yes even register as allocation region (which allows a value 
to not be stored in RAM and thus being not easily overwritten, 
which is useful for security reasons like storing pointer 
encryption keys, stack canaries or to assign common registers to 
variables)
  - or memory-only allocation (which requires a value to not be 
stored/hold in registers)

Memory safety problems often boil down to that the program 
accidentally stores a pointer value into a variable being 
*semantically* out of bounds or pointing to a too small memory 
area w.r.t. the variable's purpose or scope. (Aliasing of 
variables in languages which better type-safety this probably 
impossible, typical is the case for unbounded data structures 
like C's variadic arguments or attacker-controlled 
variable-length arrays).

I don't know any other language yet which has allocation 
attributes for pointer/reference variables and allocation 
attributes for value-typed variables which restricts the 
allocation region for the data of the variable (some kind of 
contract on variable level). However, value-type variables are 
another story because they are allocated at the same time when 
defined and would only serve as expressive generalization of such 
attributes, generalized to value-types and even control 
structures.

Looking at what attributes D provides I definitly see that memory 
safety related concerns are addressed with existing attributes. 
But I personally find them rather unintuitive to use or difficult 
to learn for beginners and not flexible enough (like two 
different versions of `return ref scope`). As currently defined, 
those attributes don't annotate destinations of data flow but 
sources (like function arguments) of data flow.

What I imagine: More specific scopes (more specifically 
attributed types) or allocation regions correspond to more 
generalized types. Some scopes/allocation regions are contained 
within others (these smaller contained regions are virtually base 
types of bigger regions) and some regions are disjoint (but they 
should not intersect each other incompletely which is against the 
structured-programming paradigm and inheritance analogy in OOP). 
This results in Scope Polymorphy which statically/dynamically 
checks the address type of RHS expressions during assignments and 
memory safety becomes a special case of type safety.

I could annotate return values with attributes to make clear that 
a function returns GC-allocated memory, e.g. using a @gc 
attribute.

```d
@gc string[] stringifyArray(T : U[], U)(T arr) {
     import std.algorithm.iteration : map;
     import std.conv : to;
     return arr.map!(value => value.to!string);
}
@nogc stringtable = stringifyArray([1, 2, 3]);    // error!

// a useless factory example
@new auto makeBoat(double length, double height, Color c) {
     theAllocator = Mallocator;

     auto b = new Boat(length, height, c);

     theAllocator = processAllocator;
     return b;
}

// combining multiple attributes gives a union of both which is 
valid for reference variables
@new @newcpp @gc Boat = makeBoat( ... );
// technically, a union of attributes for value types is possible 
but would
// require inferring the most appropriate attribute from context 
which is difficult
```

Variables with no attributes allow any pointer for assignment and 
will infer the proper attribute from the assignment.

Some of these use cases are already covered by existing 
attributes:

  - `scope` makes sure, that a reference variable is not written 
to an allocation region outside the current function block (which 
corresponds to using "`@scope(function)`" with the argument, see 
below) and it would be type-unsafe to assign it to a variable 
type with larger scope. "Scope" basically means, the argument 
belongs to a stack frame in the caller chain. (It corresponds to 
arguments annotated with "`@caller`", see below.) It's used to 
tell the function that the referenced value has a limited 
lifetime in a caller stack frame despite being a reference 
variable and the reference could become invalid after the 
function returns so it must not write the value to variables 
outside the function. For arguments this is very useful and I 
would rather prefer the complementary case to be explicit. That's 
where `in` is really useful as a short form.
  - `ref` specifies that the actual allocation region of a 
variable's value is outside of the function scope in which the 
variable is visible (or used). (`out` is similar.)
  - `return ref` specifies that the value (referenced by the 
returned reference) is in the same allocation scope as the 
argument annotated with `return ref` (corresponds to the 
annotation of the return type with "`@scope(argName)`", see 
below).
  - `return ref scope`, a combination of two above. The return 
type is seen to have the same allocation region equal to the one 
used by this annotated argument.
  - `__gshared`, `shared`. Variables with these attributes save 
them in a scope accessible across threads. This is the default in 
C so that `__gshared` corresponds to C's volatile values which 
are accepted by `@memory` references.

Here is a (really long) collection of many possible memory 
attributes I am looking for. They define which addresses of 
values are accepted for the pointer/reference:

  - @auto: allocation in any stack frame, which includes 
fixed-size value-type variables passed as arguments or return 
value
  - @stack: dynamic allocation in any stack frame (alloca)
  - @loop: allocation in the region which lives as long as the 
current loop lives
  - @scope(recursion): allocation-scope not yet available in D I 
believe, scope which lives as long as the entire recursion lives, 
equivalent to `loop` in the functional sense. Locals in this 
scope are accessible to all recursive calls of the same function.
  - @scope(function): allocation in the current stack frame 
(`scope`d arguments are a special case of this)
  - @scope(label): allocation in the scope of the labeled control 
structure
  - @scope(identifier): allocation in the same scope as the 
specified variable name, `return ref` can be seen as special case 
for return types.
  - @static: allocation/scope in static memory segment (lifetime 
over entire program runtime), `static` variables and control 
structures are a special case of this attribute
  - @caller: allocation in the caller's stack frame (usuable for 
convenient optimizations like shown below), an "implicit 
argument" when used for value types, corresponds to `ref scope` 
for reference-type variables. Something in between "static" and 
"auto".
  - @gc: allocation region managed by D's garbage collector
  - @nogc: disallows pointer/reference to GC-allocated data
  - @new: allocation region managed by Mallocator
  - @newcpp: allocation region managed by stdcpp allocator thing, 
eases C++ compatibilty
  - @peripheral: target- or even linkerscript-specific memory 
region for peripherals
  - @register: only stored in a register (with compile-time error 
if not possible)
  - @shared: allocation region for values which are synchronized 
between threads
  - @memory: never stored in a register (use case can overlap with 
"`@peripheral`", it's used for variables whose content can change 
non-deterministically and must be reloaded from memory each time, 
for example interrupt-handler modified variables, it also 
prevents optimization when unwanted)
  - @make(allocator): allocated by the given allocator (dynamic 
type check required, if "allocator" is a dynamic object)

In the basic version for reference variables these attributes 
statically/dynamically assert that a given pointer value is in 
bounds of that allocation region. Of course, this is a long list 
of personal ideas and some of them could be unpopular in the 
community. But I think, all of them would be a tribute to Systems 
programming.

Why are such attributes useful? At first because type-safe design 
means to restrict value domains as much as possible so that it is 
only as large as required. They restrict the address (pointer 
value) at which a value bounded by a variable can be located and 
provide additional static type checks as well as *allocation 
transparency* (something which I miss in every language I used so 
far). The good thing is, if no attribute is provided, it can be 
inferred from the location where the value-typed variable is 
defined or is inferred from the assigned pointer value for 
reference types.
Maybe also useful: with additional memory safety attributes, it 
could become legitimate to assign to `scope`d reference variables.

For reference-type variables, these attributes are simple value 
domain checks of the pointer variable. A disadvantage of memory 
attributes is (like with polymorphy) that runtime checks might be 
needed in some cases when static analysis isn't sufficient (if 
attributes are casted).

An interesting extension is a generalization to value-type 
variables. It can generalize the `scope` and `return` attribute 
to value-types. While probably not un-controversal it could allow 
fine control over variable allocation and force where a 
value-typed variable is allocated exactly (allocation 
guarantees). You could indirectly define a variable in a nested 
code block which is allocated for another scope. The main 
disadvantage I can think of is only, that it cannot be just 
created as a library add on.

```d
outer: {
     // ...
     @scope(inner) uint key = generateKey(seed);  // precomputes 
the RHS
     // and initializes the LHS with the memorized value when 
entering the "inner" block
     seed.destroy();    // do something with seed, modify/destroy 
it, whatever
     // key is not available/accessible here
     // Message cipher;   // <-- implicit but uninitialized
     inner: if (useSecurity) {
         // if not entered, the init-value of the variable is used
         @scope(outer) Message cipher = encrypt(key);
         // Implicitly defines "cipher" uninitialized in "outer" 
scope.
         // Generates default init in all other control flow paths 
without @scope(outer) definition
     }
     //else cipher = Messsage.init;   // <-- implicit, actual 
initialization
     decrypt(cipher, key);    // error, key is only available in 
the "inner" scope
}
```

Some would criticize the unconventional visibility of `cipher` 
which doesn't follow common visibility rules. For example if 
`static` variables are defined in functions, they are still only 
visible in the function itself and not in the entire scope in 
which they live. So a likely improvement would be that the 
visibility is not impacted by the attribute, only the point of 
actual creation/destruction. Just looking at the previous 
example, it would seem useless at first, but it's not if loops 
are considered (and variables which have `@loop` scope, that 
means are created on loop entry and only destructed on loop exit).

Also interesting cases can emerge for additional user 
optimization in order to avoid costly recomputation by using a 
larger scope as allocation region:

```d
double permeability(Vec2f direction) {
     @caller Vec2f grad = calculateTextureDerivative();
     // "grad" is a common setup shared by all calls to 
"permeability" from the same caller instance
     // It is hidden from the caller because it's an 
implementation detail of this function.
     // All calls of "grad" by the same caller will use the same 
variable.
     // It would be implemented as invisible argument whose 
initialization
     // happens in the caller. The variable is stored on the 
caller's site as
     // invisible variable and is passed with every call.
     return scalprod(direction, grad);
}
```

A main benefit of this feature is readability and in some cases 
optimization because the executed function is not repeated for 
every call, only if the repetition is needed which can be 
computed in the callee instead.
For closures the `@caller` scope is clear but it also works for 
non-closure functions as an invisible argument. Modifications to 
a `@caller ref` variable are remembered for consecutive calls 
from the same caller stack frame whereas `@caller` without ref 
maybe only modifies a local copy.

Or being able to create Arrays easily on the stack which is yet a 
further extension

```d
@auto arr1 = [0, 1, 2, 3];  // asserts fixed-size, okay, but 
variable size would fail
@stack arr2 = dynamicArr.dup;   // create a copy on stack, the 
stack is "scope"d
```

An easy but probably limited implementation would set 
`theAllocator` before the initialization of such an attributed 
value-type variable and resets `theAllocator` afterwards to the 
allocator from before.

Finally, one could even more generally annotate control 
structures with attributes to define in whose scope's entry the 
control structure's arguments are evaluated (e.g. `static if` is 
a special case which represents `@static if` in terms of 
attributes) but this yet another different story and unrelated to 
allocation.

This is it, I'm sorry for the long post. It took me a while to 
write it down and reread.
Regards!