Memory/Allocation attributes for variables

Thu Jun 3 17:26:03 UTC 2021

Thank you for answering.

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
wrote:
...
>> The thing I'd like to gain with those attributes is a 
>> guarantee, that the referenced value wasn't allocated in a 
>> certain address region/scope and lives in a 
>> lifetime-compatible scope which can be detected by checking 
>> the pointer value against an interval or a range of intervals. 
>> For example a returned reference to an integer could have been 
>> created with "malloc" or even a C++ allocator or interfacing 
>> functions could annotate parameters with such attributes.
>
> Well, I guess you are new, but Walter will refuse having many 
> pointer types. Even the simple distinction between gc and raw 
> pointers will be refused. The reason being that it would lead 
> to a combinatorial explosion of function instances and prevent 
> separate compilation.

The separate compilation is a good point. Binary compatibility is 
a common property considered for security safeguards. But at 
least static checking with attributes would need no memory 
addresses at all (also if the compiler can infer the attribute 
for every value-typed variable automatically from where it is 
defined). Dynamic checks of pointers accross binary interfaces 
are difficult. It would work flawlessly with library-internal 
memory regions but for outside pointer values it can only rely on 
runtime information (memory regions used by allocators) or cannot 
perform checks at all (because it doesn't know the address ranges 
to check against). Or it would work better if binaries would 
support relocations for application-related memory addresses 
which are filled at link time. Static checks strike the balance 
here.

> I personally have in the past argued that it would be an 
> interesting experiment to make all functions templates and 
> template pointer parameter types.
>
> That you can do with library pointer types, as a proof of 
> concept, yourself. Then you will see what the effect is.

Okay, that's fine. Pointers in D are not debatable, I would not 
try. I think, any new language should remove the concept of 
pointers entirely rather than introducing new pointers. Pointers 
from C should be treated as reference variables, pointers to C as 
either an unbounded slice (if bounded, there should be another 
`size_t` argument to the function) or it passes addresses 
obtained from variables. As a C programmer I'd say that C's 
pointer concept was never needed as it stands, it just was 
created to be an unsafe reference variable + a reference + an 
iterator all-in-one-solution as the simplest generic thing which 
beats it all (without knowing the use case by looking at the 
pointer type).

Attributes only would check properness of pointer value 
assignments without code duplication of the function as `auto 
ref` is doing. (One can still interprete it as part of the type.)

On Tuesday, 1 June 2021 at 06:12:05 UTC, Ola Fosheim Grostad 
wrote:
>> lifetime region with equal lifetime. The comparison between 
>> stack addresses assumes that an address deeper in the stack 
>> has a higher or equal lifetime. The caller could also provide 
>> it's stack frame bounds which allows to consider this interval 
>> as one single lifetime.
>
> How about coroutines? Now you have multiple stacks.

Thanks, I missed that, at least true coroutines have. Other 
things also can dissect stack-frame memory (function-specific 
allocators in the stack-region). But in our case, it's already a 
question whether such special stack frames still should be 
allocated in the stack-region, statically (as I implemented it 
once for C) or in a heap region (like stack frames of 
continuations). You could at least place coroutine stack frames 
in some allocator region in static memory.

A probably less fragile but more costly solution (when checking 
stack-addresses) for stack address scope would be storing the 
stack depth of an address in the upper k-bit portion of a wide 
pointer value (for a simple check) but this is only a further 
unrelated idea.

> Dynamic checks are unlikely to be accepted, I suggest you do 
> this as a library.

Right, if nobody tried it so far I'd like myself. Then I can firm 
my D experience with further practice. I'd compare the nature of 
static and dynamic attribute checks to the nature of C++ 
`static_cast` and `dynamic_cast` of class pointers. I was 
thinking, such a user library could use `__traits` with templated 
operator overloads.

>> Where the feature shines most is function signatures because 
>> they separate code and create intransparency which can be 
>> countered by memory attributes for return type and argument 
>> types.
>
> Unfortunately, this is also why it will be rejected.

So, is that D's tenor that function signatures are thought to 
create *in*transparency and should continue to do so? Does the 
community think, allocation and memory transparency is a bad 
thing or just not needed? IMO, allocation and memory transparency 
is relevant to being a serious Systems programming language (even 
though C doesn't have it, C++ doesn't have it and C# is no 
Systems Programming :-D ). Isn't the missing memory transparency 
from outside of functions the reason why global variables are 
frowned upon by many? Related to referential transparency (side 
effects), less transparency makes programs harder to debug, 
decouple and APIs harder to use right. (Just the single `map` 
issue with fixed-size arrays...)

>> Okay, I didn't define aliasing. With "aliasing" I mean that 
>> "aliasing references" (or pointers) either point to the exact 
>> same address or that the immediately pointed class/struct 
>> (pointed to by the reference/pointer) does not overlap. I 
>> would consider anything else more complicated than necessary.
>
> Insufficient for D with library container types and library 
> smart pointers.

Yeah. It makes no sense if we consider the pointer layers between 
the exposed pointer and the actual data (I assume, smart pointers 
in D are implemented with such a middle layer in between). But if 
it only means the first payload data layer that represents the 
actual root node of any graph-like data structure, is it still 
flawed? At least, if I can annotate all pointer variables in my 
data structures and if checks are done for every single 
reference/pointer assignment with any access so that no pointer 
value range in the entire structure ever becomes violated, isn't 
it closer to memory safety than without? Of course, I could still 
pass references to those pointers to a binary which write into it 
without knowing any type information but that's a deliberate risk 
which static type checking cannot mitigate, only dynamic value 
checking of the pointed data after function return. (Probably 
another useful safety feature for my idea.)

Of course attributes are optional, nobody has to annotate 
anything with the risk of obtaining falsely scoped pointer values.

But would you agree, it would be better than not having it? Of 
course, it doesn't make everything safe, particularly if one can 
omit it but annotating variables with attributes could help with 
ownership (I think in a better design than Walter's proposal of 
yet another function attribute @live instead of a variable 
attribute). With ownership I mean to prevent leakage of 
(sensible) data out of a function (not just reference values as 
with `scope`) and could provide some sanity checks and even 
provide more transparency for API use (because then I can see 
what kind of allocated memory I can expect for parameters and 
return value). I think, it could improve interfacing with C++ as 
well.
At the end, I only want certainty about the references and 
pointers when I look into a function signature.

I probably should (try to) implement it myself as a proof of 
concept.

Regards, Elmar