Memory/Allocation attributes for variables
Elmar
chrehme at gmx.de
Sun May 30 02:18:38 UTC 2021
Hello dear D community,
personally I'm interested in security, memory safety and such
although I'm not an expert.
I would like to know, if D has memory/allocation attributes (I
will use both terms interchangeably), or if someone knows a
library for that which statically asserts that only values
allocated in compatible allocation regions are contained in
variables of attributed type. Attributes for memory safety can be
seen as extension to the type system because these memory
attributes constrain the address domain of values instead of the
value domain itself and so they become part of the value domain
of a pointer or reference.
Now there are A LOT of different allocation regions and scopes
for variables for whatever purpose:
- static allocation
- stack frame
- dynamic allocation w/o GC
- dynamic allocation with GC
- fast allocators optimized for specific objects or even
function-specific allocators
- peripheral addresses
- yes even register as allocation region (which allows a value
to not be stored in RAM and thus being not easily overwritten,
which is useful for security reasons like storing pointer
encryption keys, stack canaries or to assign common registers to
variables)
- or memory-only allocation (which requires a value to not be
stored/hold in registers)
Memory safety problems often boil down to that the program
accidentally stores a pointer value into a variable being
*semantically* out of bounds or pointing to a too small memory
area w.r.t. the variable's purpose or scope. (Aliasing of
variables in languages which better type-safety this probably
impossible, typical is the case for unbounded data structures
like C's variadic arguments or attacker-controlled
variable-length arrays).
I don't know any other language yet which has allocation
attributes for pointer/reference variables and allocation
attributes for value-typed variables which restricts the
allocation region for the data of the variable (some kind of
contract on variable level). However, value-type variables are
another story because they are allocated at the same time when
defined and would only serve as expressive generalization of such
attributes, generalized to value-types and even control
structures.
Looking at what attributes D provides I definitly see that memory
safety related concerns are addressed with existing attributes.
But I personally find them rather unintuitive to use or difficult
to learn for beginners and not flexible enough (like two
different versions of `return ref scope`). As currently defined,
those attributes don't annotate destinations of data flow but
sources (like function arguments) of data flow.
What I imagine: More specific scopes (more specifically
attributed types) or allocation regions correspond to more
generalized types. Some scopes/allocation regions are contained
within others (these smaller contained regions are virtually base
types of bigger regions) and some regions are disjoint (but they
should not intersect each other incompletely which is against the
structured-programming paradigm and inheritance analogy in OOP).
This results in Scope Polymorphy which statically/dynamically
checks the address type of RHS expressions during assignments and
memory safety becomes a special case of type safety.
I could annotate return values with attributes to make clear that
a function returns GC-allocated memory, e.g. using a @gc
attribute.
```d
@gc string[] stringifyArray(T : U[], U)(T arr) {
import std.algorithm.iteration : map;
import std.conv : to;
return arr.map!(value => value.to!string);
}
@nogc stringtable = stringifyArray([1, 2, 3]); // error!
// a useless factory example
@new auto makeBoat(double length, double height, Color c) {
theAllocator = Mallocator;
auto b = new Boat(length, height, c);
theAllocator = processAllocator;
return b;
}
// combining multiple attributes gives a union of both which is
valid for reference variables
@new @newcpp @gc Boat = makeBoat( ... );
// technically, a union of attributes for value types is possible
but would
// require inferring the most appropriate attribute from context
which is difficult
```
Variables with no attributes allow any pointer for assignment and
will infer the proper attribute from the assignment.
Some of these use cases are already covered by existing
attributes:
- `scope` makes sure, that a reference variable is not written
to an allocation region outside the current function block (which
corresponds to using "`@scope(function)`" with the argument, see
below) and it would be type-unsafe to assign it to a variable
type with larger scope. "Scope" basically means, the argument
belongs to a stack frame in the caller chain. (It corresponds to
arguments annotated with "`@caller`", see below.) It's used to
tell the function that the referenced value has a limited
lifetime in a caller stack frame despite being a reference
variable and the reference could become invalid after the
function returns so it must not write the value to variables
outside the function. For arguments this is very useful and I
would rather prefer the complementary case to be explicit. That's
where `in` is really useful as a short form.
- `ref` specifies that the actual allocation region of a
variable's value is outside of the function scope in which the
variable is visible (or used). (`out` is similar.)
- `return ref` specifies that the value (referenced by the
returned reference) is in the same allocation scope as the
argument annotated with `return ref` (corresponds to the
annotation of the return type with "`@scope(argName)`", see
below).
- `return ref scope`, a combination of two above. The return
type is seen to have the same allocation region equal to the one
used by this annotated argument.
- `__gshared`, `shared`. Variables with these attributes save
them in a scope accessible across threads. This is the default in
C so that `__gshared` corresponds to C's volatile values which
are accepted by `@memory` references.
Here is a (really long) collection of many possible memory
attributes I am looking for. They define which addresses of
values are accepted for the pointer/reference:
- @auto: allocation in any stack frame, which includes
fixed-size value-type variables passed as arguments or return
value
- @stack: dynamic allocation in any stack frame (alloca)
- @loop: allocation in the region which lives as long as the
current loop lives
- @scope(recursion): allocation-scope not yet available in D I
believe, scope which lives as long as the entire recursion lives,
equivalent to `loop` in the functional sense. Locals in this
scope are accessible to all recursive calls of the same function.
- @scope(function): allocation in the current stack frame
(`scope`d arguments are a special case of this)
- @scope(label): allocation in the scope of the labeled control
structure
- @scope(identifier): allocation in the same scope as the
specified variable name, `return ref` can be seen as special case
for return types.
- @static: allocation/scope in static memory segment (lifetime
over entire program runtime), `static` variables and control
structures are a special case of this attribute
- @caller: allocation in the caller's stack frame (usuable for
convenient optimizations like shown below), an "implicit
argument" when used for value types, corresponds to `ref scope`
for reference-type variables. Something in between "static" and
"auto".
- @gc: allocation region managed by D's garbage collector
- @nogc: disallows pointer/reference to GC-allocated data
- @new: allocation region managed by Mallocator
- @newcpp: allocation region managed by stdcpp allocator thing,
eases C++ compatibilty
- @peripheral: target- or even linkerscript-specific memory
region for peripherals
- @register: only stored in a register (with compile-time error
if not possible)
- @shared: allocation region for values which are synchronized
between threads
- @memory: never stored in a register (use case can overlap with
"`@peripheral`", it's used for variables whose content can change
non-deterministically and must be reloaded from memory each time,
for example interrupt-handler modified variables, it also
prevents optimization when unwanted)
- @make(allocator): allocated by the given allocator (dynamic
type check required, if "allocator" is a dynamic object)
In the basic version for reference variables these attributes
statically/dynamically assert that a given pointer value is in
bounds of that allocation region. Of course, this is a long list
of personal ideas and some of them could be unpopular in the
community. But I think, all of them would be a tribute to Systems
programming.
Why are such attributes useful? At first because type-safe design
means to restrict value domains as much as possible so that it is
only as large as required. They restrict the address (pointer
value) at which a value bounded by a variable can be located and
provide additional static type checks as well as *allocation
transparency* (something which I miss in every language I used so
far). The good thing is, if no attribute is provided, it can be
inferred from the location where the value-typed variable is
defined or is inferred from the assigned pointer value for
reference types.
Maybe also useful: with additional memory safety attributes, it
could become legitimate to assign to `scope`d reference variables.
For reference-type variables, these attributes are simple value
domain checks of the pointer variable. A disadvantage of memory
attributes is (like with polymorphy) that runtime checks might be
needed in some cases when static analysis isn't sufficient (if
attributes are casted).
An interesting extension is a generalization to value-type
variables. It can generalize the `scope` and `return` attribute
to value-types. While probably not un-controversal it could allow
fine control over variable allocation and force where a
value-typed variable is allocated exactly (allocation
guarantees). You could indirectly define a variable in a nested
code block which is allocated for another scope. The main
disadvantage I can think of is only, that it cannot be just
created as a library add on.
```d
outer: {
// ...
@scope(inner) uint key = generateKey(seed); // precomputes
the RHS
// and initializes the LHS with the memorized value when
entering the "inner" block
seed.destroy(); // do something with seed, modify/destroy
it, whatever
// key is not available/accessible here
// Message cipher; // <-- implicit but uninitialized
inner: if (useSecurity) {
// if not entered, the init-value of the variable is used
@scope(outer) Message cipher = encrypt(key);
// Implicitly defines "cipher" uninitialized in "outer"
scope.
// Generates default init in all other control flow paths
without @scope(outer) definition
}
//else cipher = Messsage.init; // <-- implicit, actual
initialization
decrypt(cipher, key); // error, key is only available in
the "inner" scope
}
```
Some would criticize the unconventional visibility of `cipher`
which doesn't follow common visibility rules. For example if
`static` variables are defined in functions, they are still only
visible in the function itself and not in the entire scope in
which they live. So a likely improvement would be that the
visibility is not impacted by the attribute, only the point of
actual creation/destruction. Just looking at the previous
example, it would seem useless at first, but it's not if loops
are considered (and variables which have `@loop` scope, that
means are created on loop entry and only destructed on loop exit).
Also interesting cases can emerge for additional user
optimization in order to avoid costly recomputation by using a
larger scope as allocation region:
```d
double permeability(Vec2f direction) {
@caller Vec2f grad = calculateTextureDerivative();
// "grad" is a common setup shared by all calls to
"permeability" from the same caller instance
// It is hidden from the caller because it's an
implementation detail of this function.
// All calls of "grad" by the same caller will use the same
variable.
// It would be implemented as invisible argument whose
initialization
// happens in the caller. The variable is stored on the
caller's site as
// invisible variable and is passed with every call.
return scalprod(direction, grad);
}
```
A main benefit of this feature is readability and in some cases
optimization because the executed function is not repeated for
every call, only if the repetition is needed which can be
computed in the callee instead.
For closures the `@caller` scope is clear but it also works for
non-closure functions as an invisible argument. Modifications to
a `@caller ref` variable are remembered for consecutive calls
from the same caller stack frame whereas `@caller` without ref
maybe only modifies a local copy.
Or being able to create Arrays easily on the stack which is yet a
further extension
```d
@auto arr1 = [0, 1, 2, 3]; // asserts fixed-size, okay, but
variable size would fail
@stack arr2 = dynamicArr.dup; // create a copy on stack, the
stack is "scope"d
```
An easy but probably limited implementation would set
`theAllocator` before the initialization of such an attributed
value-type variable and resets `theAllocator` afterwards to the
allocator from before.
Finally, one could even more generally annotate control
structures with attributes to define in whose scope's entry the
control structure's arguments are evaluated (e.g. `static if` is
a special case which represents `@static if` in terms of
attributes) but this yet another different story and unrelated to
allocation.
This is it, I'm sorry for the long post. It took me a while to
write it down and reread.
Regards!
More information about the Digitalmars-d
mailing list