Memory/Allocation attributes for variables

Tue Jun 1 00:36:17 UTC 2021

Good questions :-) .

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
> On Monday, 31 May 2021 at 18:21:26 UTC, Elmar wrote:
>> I wonder how programming languages don't see the obvious, to 
>> consider memory safety as a part of type safety 
>> (address/allocation properties to be type properties) and that 
>> memory unsafe code only means an incomplete type system.
>
> All high level programming-languages do. Only the low level 
> don't, and that is one of the things what makes their type 
> systems unsound.

I suppose you mean the "higher" level languages (because C is by 
original definition also a high-level language). I neither know 
any "higher" level language which provides the flexibility of 
constraining the value domain of a pointer/reference except for 
restricting `null` (non-nullable pointers are probably the most 
simple domain constraint for pointers/references). I think, not 
even Ada nor VHDL have it.

The thing I'd like to gain with those attributes is a guarantee, 
that the referenced value wasn't allocated in a certain address 
region/scope and lives in a lifetime-compatible scope which can 
be detected by checking the pointer value against an interval or 
a range of intervals. For example a returned reference to an 
integer could have been created with "malloc" or even a C++ 
allocator or interfacing functions could annotate parameters with 
such attributes.

With guarantees about the scope of arguments function 
implementations can avoid buggy reference assignments to outside 
variables. The function could expect compatible references 
allocated with GC but the caller doesn't know it. Whether any 
reference variable assignment is legitimate can be checked by 
comparing the source attributes (the reference value which says 
where the value is allocated) with the destination attributes 
(where the reference is stored in memory). Even better are 
runtime checks of pointer values for a better degree of memory 
safety but only if the programmers want to use it. A reference 
assignment is legitimate if the destination scope is compatible 
with the source's scope, not in any other case. I would suggest a 
lifetime rating for value addresses as follows:

*peripheral > system/kernal > global shared > private global 
(TLS) > extern global (TLS) > shared GC allocated > shared 
dynamically allocated > GC allocated (TLS) > dynamically 
allocated (TLS) <=> RAII/scoped/stack <=> RAII/scoped/stack > 
register*

Heap regions are not always comparable to stack or RAII. So the 
current practice of not allowing assignment to RAII references 
(using `scope` attribute) is probably best to continue. 
Everything other than stack addresses are seen as one single 
lifetime region with equal lifetime. The comparison between stack 
addresses assumes that an address deeper in the stack has a 
higher or equal lifetime. The caller could also provide it's 
stack frame bounds which allows to consider this interval as one 
single lifetime.

It should constrain the possible value domain of pointers 
absolutely so that no attack with counterfeited pointers to 
certain memory addresses is possible. If I would use custom 
allocators for different types I could expect or delimit what the 
pointer value can be.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
>> constraints. Memory safety is violated by storing a pointer 
>> value in a reference which is out of the intended/reasonable 
>> value domain of the pointer (not matching its lifetime).
>
> But how do you keep track of it without requiring that all 
> graphs are acyclic? No back pointers is too constraining.
>
> And no, Rust does not solve this. Reference counting does not 
> solve this. How do you prove that a graph remains fully 
> connected when you change one pointer?

I think, this is GC-related memory management, not type checking. 
The memory attributes don't solve memory management problems. The 
problem with reference counting usually is solved by inserting 
weak pointers into cycles (which also solves the apparent 
contradiction of a cycle of references). Weak references are used 
by those objects which are deeper in the graph of data links. 
Otherwise it's a code smell and one could refactor the links into 
a joint object and deleted objects will deregister in this joint 
object. I already thought about other allocation schemes for 
detecting cycles that could be combined with reference counting. 
For example tagging structs/classes with the ID of the conntected 
graph in which they are linked if they aren't leaves. But this ID 
is difficult to change. It can also analyze at compile time which 
pointers can only be part of a cycle but more explanation leads 
to far here.

Instead the problem, my idea is intended to solve, is

  1. giving hints to programmers (to know which kind of allocated 
memory works with the implementation, stack addresses apparently 
won't generally work with `map` for example)
  2. having static or dynamic (simple) value domain checks (which 
checks whether a pointer value is in the allowed interval(s) of 
the allocation address spaces belonging to the attributes) which 
ensures that only allowed types of allocation are used. These 
checks can be used to statically or dynamically dispatch 
functions. Of course such a check could also be performed 
manually but it's tedious and requires me to put all different 
function bodies in one `static if else`.

It's more of a lightweight solution and works like an ordinary 
type check (value-in-range check).

Where the feature shines most is function signatures because they 
separate code and create intransparency which can be countered by 
memory attributes for return type and argument types.

On Monday, 31 May 2021 at 18:56:44 UTC, Ola Fosheim Grøstad wrote:
>> One important aspect which I forgot: aliasing of variables. I 
>> know, D allows aliased references as arguments by default. 
>> Many memory safety problems derive from aliased variables 
>> which were not assumed to be aliased.
>
> So, how do you know that you don't have aliasing when you 
> provide pointers to two graphs? How do you prove that none of 
> the nodes in the graph are shared?

Okay, I didn't define aliasing. With "aliasing" I mean that 
"aliasing references" (or pointers) either point to the exact 
same address or that the immediately pointed class/struct 
(pointed to by the reference/pointer) does not overlap. I would 
consider anything else more complicated than necessary. The 
definition doesn't care about further indirections. I often only 
consider the directly pointed struct or class contiguous chunk of 
memory as "the type". If I code a function, I'm usually only 
interested in the top level of the type (the "root node" of the 
type) and further indirections are handled by nested function 
calls. For example it suffices, if two argument slices are not 
overlapping. For that I only need to check aliasing as just 
defined. If you really would like two arguments (graphs) to not 
share any single pointer value I would suggest using a more 
appropriate type than a memory attribute, a type which is 
recursively "unique" (in terms of only using "unique pointers").

Do you think, it sounds like a nice idea to have a data structure 
attribute `unique` next to `abstract` and `final` which 
recursively guarantees that any reference or pointer is a unique 
pointer?

If you are interested for a algorithmic answer to your questions, 
then the best approach (I quickly can think of) is creating an 
appropriate hash table from all pointers in one graph and testing 
all pointers in the other graph against it (if I cannot use any 
properties on the pointers' values, e.g. that certain types and 
all indirections are allocated in specific pools). But that only 
works with exactly equal pointer values.