Direct recursion detection possible?
Quirin Schroll
qs.il.paperinik at gmail.com
Thu May 25 10:35:45 UTC 2023
On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
> Due to some stupidity of mine involving misuse of templates I
> ended up with a routine that simply called itself infinitely.
> Directly, that is, and in this case the routine was optimised
> down into a simple infinite loop with no loop body content.
I had this almost happen in my C++ code. It never ran because I
got a compile error. The compiler is MSVC.
What I did is quite simple: Have an overload that forwards to
another overload; essentially:
```d
R f(actual args) { /*actual work...*/ }
R f(alternative args) => f(actual args);
```
I misspelled the second and it called itself.
> Is it possible for our compilers to detect direct infinite
> recursion? I’m ignoring the indirect call case, and I’m also
> ignoring anything involving function pointers, so given those
> restrictions, we will be able to see that the generated code of
> a routine is calling its own address, because that address is a
> literal and we know what the current function is that we’re
> currently in.
As MSVC shows, the answer is yes. Obviously, simple cases are
detectable. Of course, solving this 100% is equivalent to the
halting problem. This is not a problem whatsoever; D has multiple
examples of best-effort approaches: `@safe` is a good example.
Not every code that actually is without undefined behavior can be
detected as such, but a lot of simple cases can. Likewise, not
every infinite recursion (or infinite loop without side-effects)
can be detected, but simple cases can.
I know a little about C++ compilers: GCC and Clang produce
warnings in the font-end, but MSVC produces warnings in the
back-end. For that reason, MSVC can report problems that became
apparent after optimization.
In C++, an infinite loop without observable effects is undefined
behavior. The optimizer can use this in the form that a loop that
is not obviously finite can be assumed to be finite if it does
not have observable effects.
To my knowledge, D does not make infinite loop without observable
effects undefined behavior, but D has another policy: It’s
reasonable to make an error what the programmer cannot possibly
want. An infinite loop without observable effects is an example
of that.
> And while we’re at it, is it possible to detect all kinds of
> straightforward infinite loops (not coming from recursion)?
I’d say that “all kinds of straightforward” is a contradiction in
itself. The term “straightforward” is subjective. The detection
provably cannot be perfect. We have to select cases.
> That is where the generated code, perhaps as a result of
> optimisation, is a straight loop with no body. Can we implement
> that?
Doing that after optimization has one problem: Optimization is
optional and subject to invisible improvement. Whether you get
compile-errors should not depend on which compiler you use and
what optimization level you specified.
Practical example: Say you have code that is practically dead
code and has an infinite loop. It’s practice never reached (or
provably unreachable, but the compiler doesn’t detect it).
Everything compiles and is fine. Now we make the infinite loop an
error, when it’s detected after optimization. The optimizer
doesn’t reduce it enough, so still everything compiles and is
fine. In a later compiler version, the optimizer is improved, and
now the loop is detected and the build fails. This is not a great
user experience.
It’s totally another thing if it’s a warning.
It’s totally another thing if the detection is in the front end,
the changelog mentions an improvement to infinite loop detection
and the code falls under it.
If my boss asks my why the build failed, I could point to the
compiler update and the changelog.
> It ought to be possible because the backend can spot a pattern
> like "L1: jmp L1".
* If it comes from the front-end and is consistent among
compilers and optimization levels, it should be an error. It
could be a warning, but the no-warnings policy applies.
* If it comes from the back-end, it should be a warning.
> I promise I’m not trying to solve the Halting Problem.
The Halting Problem can be solved in special cases. It is
perfectly reasonable to ask for best-effort solutions. `@safe` is
one of them.
A crucial component is `pure`, which D fortunately already has. A
quick sketch would be:
* A loop with a constant-`true` condition is infinite if it
contains no `break`. It has definitely no observable effects if
all operations in it are `pure`.
* A function definitely recurses infinitely if the only reachable
`return` statements lead to a recursion. It has no observable
effects if all operations between the function entry and those
`return` statements are `pure`.
In contrast to `@safe`, which is an under-approximation, the
infinite loop detection is an over-approximation. By that I mean
that – accepts-invalid bugs aside – code that is `@safe` today
will remain `@safe` in the future; code that today isn’t `@safe`
but would be valid as `@trusted` today might be `@safe` in
v2.142.1. The `@safe` checks reject code that *might* be unsafe.
The infinite loop detection wouldn’t reject code that *might* be
an infinite loop (there are languages that do that; e.g.
[Coq](https://en.wikipedia.org/wiki/Coq) guarantees termination);
it would reject some code that *definitely* is an infinite loop.
And with improvements, there might be more and more (not less and
less) cases that will be detected as *definitely* an infinite
loop.
DMD neither detects `while(1){}` nor `int f(int x) => f(x);`.
Those would be bare-minimum test cases. You get an error for the
latter if you make the return type `auto`, so there is something
to work with. `auto` return type detection accounts for infinite
loops, even via mutual recursion (e.g. `f` calls `g`, `g` calls
`h`, `h` calls `f`).
Actually, you can count the fact that, in D, an empty statement
is invalid for loop bodies, is a measure against unintentional
infinite loops: `while (cond());` in usual C++ compilers triggers
an infinite-loop warning; in D, it’s syntactically invalid.
More information about the Digitalmars-d
mailing list