GC.addRange in pure function
Petar
Petar
Wed Feb 10 16:25:44 UTC 2021
On Wednesday, 10 February 2021 at 13:44:53 UTC, vit wrote:
> On Wednesday, 10 February 2021 at 12:17:43 UTC, rm wrote:
>> On 09/02/2021 5:05, frame wrote:
>>> On Sunday, 7 February 2021 at 14:13:18 UTC, vitamin wrote:
>>>> Why using 'new' is allowed in pure functions but calling
>>>> GC.addRange or GC.removeRange isn't allowed?
>>>
>>> Does 'new' violate the 'pure' paradigm? Pure functions can
>>> only call pure functions and GC.addRange or GC.removeRange is
>>> only 'nothrow @nogc'.
>>
>> new allocates memory via the GC and the GC knows to scan this
>> location. Seems like implicit GC.addRange.
>
> Yes, this is my problem, if `new` can create object in pure
> function, then GC.addRange and GC.removeRange is may be pure
> too.
>
> Can I call GC.addRange and GC.removeRange from pure function
> without problem? (using assumePure(...)() ).
TL;DR Yes, you can, but it depends on what "without problem"
means for you :P
# The Dark Arts of practical D code
===================================
According to D's general approach to purity, malloc/free/GC.* are
indeed impure as they read and write global **mutable** state,
but are still allowed in pure functions **if encapsulated
properly**. The encapsulation is done by @trusted wrappers which
must be carefully audited by humans - the compiler can't help you
with that.
The general rule that you must follow for such
*callable-from-pure* code (technically it is labeled as `pure`,
e.g.:
pragma(mangle, "malloc") pure @system @nogc nothrow
void* fakePureMalloc(size_t);
but I prefer to make the conceptual distinction) is that the
effect of calling the @trusted wrapper must not drastically leak
/ be observed.
What "drastically" means depends on what you want `pure` to mean
in your application. Which side-effects you want to protect
against by using `pure`? It is really a high-level concern that
you as a developer must decide on when writing/using @trusted
pure code in your program. For example, generally everyone will
agree that network calls are impure. But what about logging? It's
impure by definition, since it mutates a global log stream. But
is this effect worth caring about? In some specific situations it
maybe ok to ignore. This is why in D you can call `writeln` in
`pure` functions, as long as it's inside a `debug` block. But
given that you as a developer can decide whether to pass `-debug`
option to the compiler, essentially you're in control of what
`pure` means for your codebase, at least to some extent.
100% mathematical purity is impossible even in the most strict
functional programming language implementations, since our
programs run on actual hardware and not on an idealized
mathematical machine. For example, even the act of reading
immutable data can be globally observed as by measuring the
memory access times - see Spectre [1] and all other
microarchitecture side-channel [1] vulnerabilities.
[1]:
https://en.wikipedia.org/wiki/Spectre_(security_vulnerability)
[2]: https://en.wikipedia.org/wiki/Side-channel_attack
That said, function purity is not useless at all, quite the
contrary. It is about making your programs more deterministic and
easy to reason about. We all want less bugs in our code and less
time spent chasing hard to reproduce crashes, right?
`pure` is really about limiting, containing / compartmentalizing
and controlling the the (in-deterministic) global effects in your
program. Ideally you should strive to structure your programs as
a pure core, driven by an imperative, impure shell. E.g. if
you're working on an accounting application, the core is the part
that implements the main domain / business logic and should be
100% deterministic and pure. The imperative shell is the part
that reads spreadsheet files, exports to pdf, etc. (actually just
the actual file I/O needs to be impure - the actual decoding /
encoding of data structures can be perfectly pure).
Now, back to practice and the question of memory management.
Of course allocating memory is globally observable effect and
even locally one can compare pointers, as Paul Backus mentioned,
as D is a systems language. However, as a practical concession,
D's concept of pure-ity is about ensuring high-level invariants
and so such low-level concerns can be ignored, as long as the
codebase doesn't observe them. What does it mean to observe them?
Here's an example:
---
void main()
{
import std.stdio : writeln;
observingLowLevelSideEffects.writeln; // `false`, but could
be `true`
notObservingSideEffects.writeln; // always `true`
}
// BAD:
bool observingLowLevelSideEffects() pure
{
immutable a = [2];
immutable b = [2];
return a.ptr == b.ptr;
}
// OK
bool notObservingSideEffects() pure
{
immutable a = [2];
immutable b = [2];
return a == b;
}
---
`observingLowLevelSideEffects` is bad, as according to the
language rules, the compiler is free to make `a` and `b` point to
the same immutable array, the result of the function is
implementation defined (or worse unspecified), which exactly what
purity should help us avoid. If `observingLowLevelSideEffects`
was not marked as `pure` it wouldn't be "BAD", just "meh". In
contrast, `notObservingSideEffects` is "OK", even though
ironically the implementation of array equality first compares
the pointers. So, `notObservingSideEffects` is basically doing
the same as `observingLowLevelSideEffects` plus some extra code.
So it's really just a question of whether the side-effects can be
observed.
If in order to perform some calculation a function allocated some
temporary memory on the heap, but then freed it once it was done,
would else someone care? If you're on a micro controller with
very limited memory then yes, but otherwise probably no.
And what if the function didn't allocate any additional memory?
And what if the function is memoized (i.e. it caches the result
of the calculation for some set of arguments)? If the cache was
shared by all threads and protected by a mutex it could be a
problem. Especially if the code locks the mutex while the
function is executing, but then the function proceeds to acquire
another mutex - it starts smell like a deadlock possibility. But
what if the cache was just thread-local - surely this must be
better? The answer is "yes", even though as far as the language
is concerned whether a global mutable variable is thread-local or
`shared` doesn't matter for function purity. But one is obviously
more deterministic than the other, even though hard to quantify.
So a good heuristic is that the more a side-effect is localized
and controlled, the easier it would be to argue that the code is
pure, as far as your application is concerned.
--------
Okay, but really what about `GC.addRange` and `GC.removeRange`?
The litmus test is whether the side-effects are controlled, i.e.
whether your code has strong exception-safety [3][4][5],
transactional semantics, ... or in other words what happens
inside it stays inside it.
[3]:
https://docs.microsoft.com/en-us/cpp/cpp/how-to-design-for-exception-safety?view=msvc-160#strong-guarantee
[4]: https://www.stroustrup.com/except.pdf
[5]: https://www.boost.org/community/exception_safety.html
So if you're implementing an RAII container, then yes, you can
mark its functions as `pure`, as the destructor will unwind the
side effects, so at least at a high-level whether GC.addRange /
GC.removeRange were called is not observable.
Even more, your container was pure, but you forgot to add calls
to GC.addRange/removeRange, and you stored references to
GC-allocated data inside those ranges, the use-after-free bugs
would surely be drastically observable, even if it occurs rarely,
so well-placed calls to `GC.addRange/removeRange` can make your
code more "pure", even if not `pure` :D
More information about the Digitalmars-d-learn
mailing list