Memory/Allocation attributes for variables
Elmar
chrehme at gmx.de
Sun Jun 6 01:10:45 UTC 2021
Thank you for your input.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Beside the combinatorial explosion in the required logic to
> check for, what happens if we copy/moving data between
> different memory annotated variables, e.g. nogc to gc, newcpp
> to gc.
> Did we auto copy, cast or throw an error. If we do not throw an
> error, an annotation might not only restrict access but also
> change semantics by introducing new references.
> So annotations become implied actions, that can be ok but is
> eventually hard to accept for the current uses of annotations.
There is no combinatorial explosion, that would be a bad idea ;-).
Annotated references behave like a super class of non-annotated
references or, say, a subset of attributes is a super class of a
superset of attributes. The best effect description (in the
dynamic case) would be viewing memory attributes like a
precondition which requires the address value to be in certain
interval(s). Currently attributes only have compile-time
semantics, you said, so a static check would fit, right?
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> There is no such thing as memory transparency, strictly
> speaking, even if you want to allocate things on the stack,
> what is if your backend doesn't have a stack at all? Or we just
> rename the heap to stack?
Okay, "memory transparency" is a bad name. It could seem that it
reveals actual memory addresses. I mean "allocation" or "scope"
transparency.
Concerning the call stack: Languages which don't provide the
abstraction of scoped variables (which is implemented by a (call)
stack) basically only have global variables + registers.
Currently, I'm not conscious about a high-level language nor
processor which wouldn't support that abstraction because it's
the most basic abstraction of any high-level language. If you
have "functions" then you also have a call stack or let's call it
"automatic scope". It doesn't matter, whether automatic scope is
allocated in heap area (which can happen with closures and
continuations), static memory area (for non-recursive functions)
or in its own area at the end of the memory layout, it only
matters that it's automatically managed by the function. I also
think that CPUs which don't support a call stack cannot be
programmed with D at all.
If attributes are used with static checks, it will not care about
the actual memory address value, only the location in source code
where a value was allocated or about the attributes which it gets
from the user. The automatic lifetime is the criterion to
distinguish it from heap, GC or static memory.
For dynamic checks, I indeed made an assumption, that in real
programs actual lifetime/scope can be inferred from memory
addresses because allocation regions of related scope usually put
variables in common memory areas (at least in common memory
segments). This would result in pointer types to be value ranges
instead of unconstrained 32-bit integers. Ultimately, information
from a linker script could be needed for authentic dynamic checks
(using relocated address for checking). I could imagine this to
be difficult on top.
Data from stack frames in the heap would be treated as
dynamically allocated and data from static stack frames would be
treated as stack. This could lead to unexpected results, false
errors, unless more information is passed with the pointer.
Dynamic checks would require a separate implementation (separate
type) which memorizes in some bits which allocation scope a value
was created. Eventually, the dynamic solution is less lightweight
in memory but it makes the value check easier.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Well, I think having both is problematic/complex. But C has
> only one of those and C++ has both.
> It's not quite correct what arrays belong, so that's a mistake.
You mean references and pointers right? References (from C++) are
immutable pointers (in theory). C++ has pointers for backwards
compatibility (and probably because the designer originally
didn't understand the problem) but are now discouraged from being
used as "raw pointers" (when I wrote "pointer" I mean "raw
pointer").
(Raw) Pointers instead are modifiable "reference variables" (like
the variables in Java) which additionally provide access to the
pointer address and allow modifying it. Reference variables
however don't allow casting to non-pointer types.
Arrays in C and C++ are actually more like C++ references, i.e.
(locally) immutable pointers.
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Annotations seam to be neat, but they parametrize your code:
>
> ```D
> @allocator("X"), @lifetime("param1", "greather", "param2") void
> f(Type1 param1, Type2 param2)
> ```
>
> becomes
>
> ```D
> void f(Allocator X,Lifetime lifetime(param1), Lifetime
> lifetime(param2))(Type1 param1, Type2 param2) if
> currentAllocator=X && lifetime(param1)>=lifetime(param2) {...}
> ```
>
> which literally turns every function allocating something into
> a template increasing "templatism" unless we get runtime
> generics as Swift.
I agree that templatism is bad.
Are attributes really lowered to template-arguments by the
compiler? I also didn't mean to introduce new syntax with a comma
between attributes. With memory attributes I really mean
attributes like `scope`, `ref`, `private`, `pure`, `@nogc` ...
which are used with reference/pointer types, not functions. You
would be right that any assignment operation to an annotated
reference needs a templated overload. I can't think of another
way how to implement it. In the worst case, it would become
something like
```D
Ref!(nogc, Flower) tulip; // anything but not allocated by
garbage collector
Ref!(static, new, Bird) raven; // no automatic allocation
```
I would already be happy with the most important attributes.
- "Oh, I see, it returns me GC-allocated memory"
- "Oh, the passed argument is allocated automatically, so I can't
put the address into a static reference."
- "Oh, a slice over a fixed-size array will not work with that
function."
---
Of course, the amount of safety to get from these attributes
depends on the programmer. For example they don't prevent
Use-after-free with `@newc` and `@newcpp` in every case because
it could be that a referenced value suddenly is deleted by code
which interrupts the function execution. The true scope depends
not only on the location of allocation but also on the location
of associated deallocation. If the deallocation can happen in a
code block, which interrupts normal function execution, than I
would treat it either like `shared` or `@memory`. The compiler
can't know all by itself. The memory safety thus will only work
if the proper attributes are used by programmers.
But the fact, that D already implements a very small weaker
subset of memory (or reference) attributes like `scope` and
`return ref` shows, that this idea does fit to D's design.
---
PS:
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> Why not removing the distinction between values and
> references/pointers at all? But I think it drifts to hard in a
> logically high level language and isn't the right way to go in
> a system level language although very interesting.
I'm getting offtopic but I totally agree with you! I have a
System Programming language idea which treats every variable as a
reference variable to get rid of the annoying value categories
and value concept by using a unified variable access interface
which allows for using different reference implementations for
different optimization scenarios (like using registers to store
and modify the referenced value).
On Thursday, 3 June 2021 at 19:36:32 UTC, sighoya wrote:
> On Thursday, 3 June 2021 at 17:26:03 UTC, Elmar wrote:
>>
>>```D
>>@stack arr2 = dynamicArr.dup; // create a copy on stack, the
>>stack is "scope"d
>>```
>> [...]
>
>
> What if dup is creating things on the heap (I don't know by the
> way). You need to make the allocator dynamically scoped.
This was supposed to be a side idea unrelated to the main idea.
`dup` does allocate memory with GC, so you'd be right when we
talk about annotating references, but the snippet here is
supposed to inject the annotated allocation of a *value*
definition into the RHS, i.e. `dup`s internal implementation.
If you like I elaborate more on that idea:
Idea: when annotating value declarations it specifies where the
return value or expression value of the RHS is allocated and
thus, where the variable will be located in memory. This
theoretical idea would give more control over the variable's
allocation.
This idea is not odd because a small subset of such attributes
for value variables IS already implemented in D, like
static/global variables, automatic local variables (of course)
and member scope variables in structs and classes. C also
features `@memory` (`volatile`) and to some extend `@register`
(C99's `restrict` is only close to it (which keeps pointer
dereferenced values in registers for further dereferencing) or
with language extensions to map variables to specific registers).
I thought, this would be not popular because it seems like D
doesn't want to be too much an alternative for C++ Systems
Programming and generalizing this concept seems like a bigger
change. That's why this idea was only a side note.
```D
@gc short opal = 3;
// eqv. to ref short opal = cast(ref short)GC.make!short(); opal
= 5;
@newc int emerald = 5;
// eqv. to ref int emerald = cast(ref int)malloc(int.sizeof);
emerald = 5;
@new float ruby = 8.;
// @new uses the "new" operator, which is not always dynamic
allocation
@rc int amethyst = 13;
// reference counted, basically an abstraction over an underlying
shared pointer
...
free(&emerald); // needed because @newc is not automatically
managed
```
A benefit is that these variables still are used like values,
i.e. they are passed by value or by reference depending on the
function parameter type, although, physically, they are a
reference of course (because everything is actually reference
which is not stored in a register, variables on the call stack
are referenced via the Stack Pointer for example).
Goal: The responsibility of allocation is shifted from the
service, the callee, (which doesn't know about any concrete
client's allocation needs) to the client, the caller, (which
knows about it's own allocation needs and actually should know
what it gets). GC has been introduced to remove the symptoms of
this problem (memory management problems) without solving it
(consequence: it gets used way more often than needed and is
inefficient). The only way, it would be solved reasonably, is
letting the caller side (LHS of assignment) deside what it needs,
not the callee side (RHS of assignment), because the caller side
has to handle it afterwards. A generic solution would be to use
some kind of Dependency Injector which handles the allocation,
uses the callee to initialize the value and passes it to the
caller. It would turn those attributes into a powerful
abstraction. A very easy implementation of the Dependency
Injector is overriding `theAllocator` while the RHS is computed.
More information about the Digitalmars-d
mailing list