Memory/Allocation attributes for variables

Sun Jun 6 01:10:45 UTC 2021

Thank you for your input.

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Beside the combinatorial explosion in the required logic to 
> check for, what happens if we copy/moving data between 
> different memory annotated variables, e.g. nogc to gc, newcpp 
> to gc.
> Did we auto copy, cast or throw an error. If we do not throw an 
> error, an annotation might not only restrict access but also 
> change semantics by introducing new references.
> So annotations become implied actions, that can be ok but is 
> eventually hard to accept for the current uses of annotations.

There is no combinatorial explosion, that would be a bad idea ;-).

Annotated references behave like a super class of non-annotated 
references or, say, a subset of attributes is a super class of a 
superset of attributes. The best effect description (in the 
dynamic case) would be viewing memory attributes like a 
precondition which requires the address value to be in certain 
interval(s). Currently attributes only have compile-time 
semantics, you said, so a static check would fit, right?

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> There is no such thing as memory transparency, strictly 
> speaking, even if you want to allocate things on the stack, 
> what is if your backend doesn't have a stack at all? Or we just 
> rename the heap to stack?

Okay, "memory transparency" is a bad name. It could seem that it 
reveals actual memory addresses. I mean "allocation" or "scope" 
transparency.

Concerning the call stack: Languages which don't provide the 
abstraction of scoped variables (which is implemented by a (call) 
stack) basically only have global variables + registers. 
Currently, I'm not conscious about a high-level language nor 
processor which wouldn't support that abstraction because it's 
the most basic abstraction of any high-level language. If you 
have "functions" then you also have a call stack or let's call it 
"automatic scope". It doesn't matter, whether automatic scope is 
allocated in heap area (which can happen with closures and 
continuations), static memory area (for non-recursive functions) 
or in its own area at the end of the memory layout, it only 
matters that it's automatically managed by the function. I also 
think that CPUs which don't support a call stack cannot be 
programmed with D at all.

If attributes are used with static checks, it will not care about 
the actual memory address value, only the location in source code 
where a value was allocated or about the attributes which it gets 
from the user. The automatic lifetime is the criterion to 
distinguish it from heap, GC or static memory.

For dynamic checks, I indeed made an assumption, that in real 
programs actual lifetime/scope can be inferred from memory 
addresses because allocation regions of related scope usually put 
variables in common memory areas (at least in common memory 
segments). This would result in pointer types to be value ranges 
instead of unconstrained 32-bit integers. Ultimately, information 
from a linker script could be needed for authentic dynamic checks 
(using relocated address for checking). I could imagine this to 
be difficult on top.

Data from stack frames in the heap would be treated as 
dynamically allocated and data from static stack frames would be 
treated as stack. This could lead to unexpected results, false 
errors, unless more information is passed with the pointer. 
Dynamic checks would require a separate implementation (separate 
type) which memorizes in some bits which allocation scope a value 
was created. Eventually, the dynamic solution is less lightweight 
in memory but it makes the value check easier.

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Well, I think having both is problematic/complex. But C has 
> only one of those and C++ has both.
> It's not quite correct what arrays belong, so that's a mistake.

You mean references and pointers right? References (from C++) are 
immutable pointers (in theory). C++ has pointers for backwards 
compatibility (and probably because the designer originally 
didn't understand the problem) but are now discouraged from being 
used as "raw pointers" (when I wrote "pointer" I mean "raw 
pointer").

(Raw) Pointers instead are modifiable "reference variables" (like 
the variables in Java) which additionally provide access to the 
pointer address and allow modifying it. Reference variables 
however don't allow casting to non-pointer types.

Arrays in C and C++ are actually more like C++ references, i.e. 
(locally) immutable pointers.

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Annotations seam to be neat, but they parametrize your code:
>
> ```D
> @allocator("X"), @lifetime("param1", "greather", "param2") void 
> f(Type1 param1, Type2 param2)
> ```
>
> becomes
>
> ```D
> void f(Allocator X,Lifetime lifetime(param1), Lifetime 
> lifetime(param2))(Type1 param1, Type2 param2) if 
> currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}
> ```
>
> which literally turns every function allocating something into 
> a template increasing "templatism" unless we get runtime 
> generics as Swift.

I agree that templatism is bad.

Are attributes really lowered to template-arguments by the 
compiler? I also didn't mean to introduce new syntax with a comma 
between attributes. With memory attributes I really mean 
attributes like `scope`, `ref`, `private`, `pure`, `@nogc` ... 
which are used with reference/pointer types, not functions. You 
would be right that any assignment operation to an annotated 
reference needs a templated overload. I can't think of another 
way how to implement it. In the worst case, it would become 
something like

```D
Ref!(nogc, Flower) tulip;       // anything but not allocated by 
garbage collector
Ref!(static, new, Bird) raven;  // no automatic allocation
```

I would already be happy with the most important attributes.

- "Oh, I see, it returns me GC-allocated memory"
- "Oh, the passed argument is allocated automatically, so I can't 
put the address into a static reference."
- "Oh, a slice over a fixed-size array will not work with that 
function."

---

Of course, the amount of safety to get from these attributes 
depends on the programmer. For example they don't prevent 
Use-after-free with `@newc` and `@newcpp` in every case because 
it could be that a referenced value suddenly is deleted by code 
which interrupts the function execution. The true scope depends 
not only on the location of allocation but also on the location 
of associated deallocation. If the deallocation can happen in a 
code block, which interrupts normal function execution, than I 
would treat it either like `shared` or `@memory`. The compiler 
can't know all by itself. The memory safety thus will only work 
if the proper attributes are used by programmers.

But the fact, that D already implements a very small weaker 
subset of memory (or reference) attributes like `scope` and 
`return ref` shows, that this idea does fit to D's design.

---

PS:

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Why not removing the distinction between values and 
> references/pointers at all? But I think it drifts to hard in a 
> logically high level language and isn't the right way to go in 
> a system level language although very interesting.

I'm getting offtopic but I totally agree with you! I have a 
System Programming language idea which treats every variable as a 
reference variable to get rid of the annoying value categories 
and value concept by using a unified variable access interface 
which allows for using different reference implementations for 
different optimization scenarios (like using registers to store 
and modify the referenced value).

On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
>>
>>```D
>>@stack arr2 = dynamicArr.dup;   // create a copy on stack, the 
>>stack is "scope"d
>>```
>> [...]
>
>
> What if dup is creating things on the heap (I don't know by the 
> way). You need to make the allocator dynamically scoped.

This was supposed to be a side idea unrelated to the main idea. 
`dup` does allocate memory with GC, so you'd be right when we 
talk about annotating references, but the snippet here is 
supposed to inject the annotated allocation of a *value* 
definition into the RHS, i.e. `dup`s internal implementation.

If you like I elaborate more on that idea:

Idea: when annotating value declarations it specifies where the 
return value or expression value of the RHS is allocated and 
thus, where the variable will be located in memory. This 
theoretical idea would give more control over the variable's 
allocation.

This idea is not odd because a small subset of such attributes 
for value variables IS already implemented in D, like 
static/global variables, automatic local variables (of course) 
and member scope variables in structs and classes. C also 
features `@memory` (`volatile`) and to some extend `@register` 
(C99's `restrict` is only close to it (which keeps pointer 
dereferenced values in registers for further dereferencing) or 
with language extensions to map variables to specific registers).

I thought, this would be not popular because it seems like D 
doesn't want to be too much an alternative for C++ Systems 
Programming and generalizing this concept seems like a bigger 
change. That's why this idea was only a side note.

```D
@gc short opal = 3;
// eqv. to  ref short opal = cast(ref short)GC.make!short(); opal 
= 5;
@newc int emerald = 5;
// eqv. to  ref int emerald = cast(ref int)malloc(int.sizeof); 
emerald = 5;
@new float ruby = 8.;
// @new uses the "new" operator, which is not always dynamic 
allocation
@rc int amethyst = 13;
// reference counted, basically an abstraction over an underlying 
shared pointer
...
free(&emerald);   // needed because @newc is not automatically 
managed
```

A benefit is that these variables still are used like values, 
i.e. they are passed by value or by reference depending on the 
function parameter type, although, physically, they are a 
reference of course (because everything is actually reference 
which is not stored in a register, variables on the call stack 
are referenced via the Stack Pointer for example).

Goal: The responsibility of allocation is shifted from the 
service, the callee, (which doesn't know about any concrete 
client's allocation needs) to the client, the caller, (which 
knows about it's own allocation needs and actually should know 
what it gets). GC has been introduced to remove the symptoms of 
this problem (memory management problems) without solving it 
(consequence: it gets used way more often than needed and is 
inefficient). The only way, it would be solved reasonably, is 
letting the caller side (LHS of assignment) deside what it needs, 
not the callee side (RHS of assignment), because the caller side 
has to handle it afterwards. A generic solution would be to use 
some kind of Dependency Injector which handles the allocation, 
uses the callee to initialize the value and passes it to the 
caller. It would turn those attributes into a powerful 
abstraction. A very easy implementation of the Dependency 
Injector is overriding `theAllocator` while the RHS is computed.