Command–query separation principle [re: @mustuse as a function attribute]
mw
mingwu at gmail.com
Wed Nov 2 21:00:44 UTC 2022
Finally I got some time to write this draft:
https://github.com/mw66/mustuse/blob/main/mustuse.md
which is copy & paste-ed below. Feel free to comment here, or log
an issue on github.
I will update, or address concerns by revising the doc there.
--------------------------------------------------------------------------
# DIP: @mustuse as function return value annotation
There are two scenarios we need to handle:
- existing legacy library code, which the programmer has no right
to modify
- new code, which the programmer has full control from the root
of the inheritance tree
## @mustuse is neither covariant nor contravariant, it is
invariant!
@mustuse as a function attribute only enforces there must be a
receiver of the function call result, and do not throw away
the returned result. It has nothing to do with the (sub-)types of
the returned value.
Let's consider the following example:
```d
abstract class AbsPaths {
// no annotation
abstract AbsPaths remove(int i); // return an AbsPaths object
with i-th element removed
}
class ImperativePaths : AbsPaths { // will modify `this` object
itself
// no @mustuse annotation, with the implicit assumption that
the caller will continue to use `this` object
override AbsPaths remove(int i) {
... // remove the i-th element of `this` object
return this;
}
}
class FunctionalPaths : AbsPaths { // will NOT modify `this`
object, but return new (separate) object on modification
@mustuse // should have this annotation
override AbsPaths remove(int i) {
AbsPaths other = new ImmutablePaths();
... // `other` is the object with the i-th element of `this`
object being removed; `this` object did not change
return other;
}
}
void main() {
AbsPaths paths = new ImperativePaths(); // or
FunctionalPaths() interchangeably
AbsPaths shortPaths = paths.remove(i); // and this line
should always work, as long as the return value is not discarded
}
```
Here both `ImperativePaths` and `FunctionalPaths` inherit from
`AbsPaths`, if one branch has @mustuse and the other one
does not, and the programmer is not forced to explicitly take the
return value and use it, these two derived classes cannot be used
interchangeably. And even worse: in the case of FunctionalPaths,
the result will be totally wrong
([the OP author's
problem](https://forum.dlang.org/post/ssagbvubpgwvewimsocj@forum.dlang.org)).
### @mustuse propagation: transitive closure
In the following class inheritance tree:
```d
----------Base-------
| | |
Derived1 Derived2 Derived3 <-user only manually maked
Derived3.remove() as @mustuse here
| | |
GrandDr1 GrandDr2 GrandDr3
|
...
```
If the programmer only manually marked Derived3.remove() as
@mustuse, then everything else (Base, Derived1, Derived2,
Derived3, GrandDr1, GrandDr2 GrandDr3, ...)'s .remove() method
will all become @mustuse (as seen by the compiler internally).
With transitive closure @mustuse rule,
Pros:
1. the method interface is consistent, e.g. at any call-site of
`AbsPaths.remove(i)`, its return value must be received.
the programmer only need to remember one interface, instead of
looking through docs/code for all the branches in the
inheritance tree, and check for a particular class, whether its
`remove(i)` method is @mustuse or not.
2. the programmer can easily switch between different derived
implementation classes of AbsPaths to maximize efficiency /
performance, without worrying about potential breakage.
Cons:
1. a few more key-strokes on every call-site of @mustuse marked
method.
## Prior work
[Paul Backus
proposal:](https://forum.dlang.org/post/cqlwlnpcbtbkzqnhicwc@forum.dlang.org)
>In other words, you cannot introduce @mustuse in a derived
>class; it must be present in the base class.
>
>This is somewhat problematic for D, because we have a universal
>base class, Object, and neither it nor any of its methods are
>@mustuse.
His reasoning is logically correct by itself, with the constraint
that legacy code is un-modifiable.
However, his proposal is not very useful because of the
constraint:
1. it won't help the existing library code where there is no
@mustuse presence today. But, these library code are where the
programmer
**want the compiler help most**, as demonstrated by the OP user
who brought up this issue on the forum. (This is also why Paul
talked about
Object, although it's not a very good example; instead we can
think about std.lib.AbsPath example above).
2. if the new rule have to fully honor the legacy (with
deficiency) code, how we can improve for future D?
Also honoring legacy code, does not mean we should not even check
for potential problems.
Even typically we cannot modify the the legacy code, at least we
want the compiler help to check where are the potential problems
are; the compiler can give warning messages, and if they are
manually verified, these findings should be formally
logged as bugs, and be fixed in the next release.
3. and if we follow this line of reasoning, it also beg the
question: whether one can remove @mustuse in a derived class.
E.g. let ImperativePaths (mutable implementation) inherit
from FunctionalPaths (immutable implementation), and the caller
can just use the `this` object as the result of the computation.
Again,
this logic is correct by itself, but it make the whole code base
brittle: what if the library author decided later one day that
s/he want to
modify the class ImperativePaths again to implement another
immutable implementation?
## Introduce two variants: @mustuse and @mustuse_remedy_legacy
1. @mustuse_remedy_legacy: this annotation is to allow programmer
flag existing legacy library code that s/he has no
right to change, but want to set a flag and let the compiler to
help to find violations; the compiler only output warnings
instead of errors. This annotation will be implicitly propagated
by the compiler throughout the whole inheritance tree.
2. @mustuse: proper, the default. For programmer to use in new
code, or library code s/he can change (from the root of the
inheritance tree). Violations of this annotation will cause
compiler error. This annotation must be explicit (just as the
keyword
`override`).
and:
a) in both cases, this function property will be transitive
closure, i.e. be propagated upwards and downwards, in all
directions.
b) in both cases, removing the annotation is not allowed in the
derived class if any supper class method carry such annotation.
Actually this rule is very simple: there is only *one* consistent
interface for any method, @mustuse is part of that method
interface;
for legacy code, @mustuse_remedy_legacy will cause compiler to
generate warning message, and for new code @mustuse will cause
the compiler to generate error message. That's all.
### @mustuse_remedy_legacy for legacy code base and pre-built
binaries
Let's revisit the motivating example:
```d
paths.remove(i); // compiles fine but does nothing
paths = paths.remove(i); // works - what I erroneously thought
the previous line was doing
```
Suppose the `paths`' type is `AbsPaths`, and the programmer have
no right to modify it (e.g. in std.lib, or even pre-built
binaries),
the compiler can only issue warning (instead of error) messages.
But the programmer must be made aware of where
these potential problems are located.
And, once the programmer discovered one such misuse problem, s/he
can try to find and fix all such potential problems by defining
a helper `class RemedyPaths` as follows:
```d
// class Paths is located in the source file that the programmer
has no right to modify
class RemedyPaths : std.lib.AbsPaths { // helper class to trace
all the occurrences of the @mustuse violation
@mustuse_remedy_legacy
override Paths remove(int i) {return null;}
}
```
and the compiler will find out all the occurrences of the same
issues in the code visited by the compilation, and issue
warnings (not errors), so the programmer have a chance to visit
all the code locations where `remove(i)`'s function
return value are discarded.
### Informative detailed compiler warning messages
Since this kind of warning message is transitive closure by
design, so for the *implicit* markings made by the
compiler: as a debugging aid the warning message should indicate
the **originating source of the annotation**
to make it clear to the programmers, e.g.:
```d
warning: std.lib.foo.d:123, AbsPaths.remove(int i)'s return value
is discarded, violates the originating
annotation from user.codebase.RemedyPaths.d:456
@mustuse_remedy_legacy.
```
### Summary
With universal (i.e. transitive closure) @mustuse,
1. when the library author has decided to return a value from a
function, typically it's represent the computation result or
status report,
which the function caller should either use or check instead of
discard. That is good engineering practice. It forces the
programmer
to pay attention to the returned value, instead of assuming the
semantics of the function e.g. based purely on the function name.
```d
ResultType result = someFunction();
... // use or check `result`
```
it will save lots of debugging time, at the expense of just a few
more key-strokes.
2. the library author can implement both ImperativePaths and
FunctionalPaths, and the library users can choose
them interchangably for the maximal efficiency / performance
without worrying about code breakage.
## History: command query separation principle
Not discarding function return value has its root from the
command query separation principle.
As an informal exercise: let's derive command query separation
principle from DbC.
The contract in DbC mostly exhibits as assertions in the code.
The programmer can insert assertions at any place of the code,
without changing the code's original semantics (i.e the behavior
when the assertions are turned-off, e.g. in release mode). In an
OOP language, most of the time the assertions are checking some
properties of an object, hence any method that can be called in
an assertion must be a query (i.e a query can be called on an
object for any number times without changing that object's
internal state). So now we have query method.
But we do need to change an object's state in imperative
programming, then those methods are classified as commands. After
changing an object state, the command must NOT return any value.
Why? because otherwise, the programmer may accidentally want to
call that command and check the returned value in some assertions
... then you know what happens in the end: the program behaves
differently when assertions are turn on in debug mode and off in
release mode.
Therefore, we have this:
> every method should either be a command that performs an
> action, or a query that returns data to the caller, but not
> both.
## References:
- [Liskov substitution
principle](https://en.wikipedia.org/wiki/Liskov_substitution_principle)
- [Command query separation
principle](https://en.wikipedia.org/wiki/Command%E2%80%93query_separation#:~:text=Command-query%20separation%20(CQS),the%20caller%2C%20but%20not%20)
- [@mustuse
DIP1038.md](https://github.com/dlang/DIPs/blob/master/DIPs/accepted/DIP1038.md#mustuse-as-a-function-attribute)
--------------------------------------------------------------------------
More information about the Digitalmars-d
mailing list