Direct recursion detection possible?

Thu May 25 10:35:45 UTC 2023

On Wednesday, 24 May 2023 at 23:04:53 UTC, Cecil Ward wrote:
> Due to some stupidity of mine involving misuse of templates I 
> ended up with a routine that simply called itself infinitely. 
> Directly, that is, and in this case the routine was optimised 
> down into a simple infinite loop with no loop body content.

I had this almost happen in my C++ code. It never ran because I 
got a compile error. The compiler is MSVC.

What I did is quite simple: Have an overload that forwards to 
another overload; essentially:
```d
R f(actual args) { /*actual work...*/ }
R f(alternative args) => f(actual args);
```
I misspelled the second and it called itself.

> Is it possible for our compilers to detect direct infinite 
> recursion? I’m ignoring the indirect call case, and I’m also 
> ignoring anything involving function pointers, so given those 
> restrictions, we will be able to see that the generated code of 
> a routine is calling its own address, because that address is a 
> literal and we know what the current function is that we’re 
> currently in.

As MSVC shows, the answer is yes. Obviously, simple cases are 
detectable. Of course, solving this 100% is equivalent to the 
halting problem. This is not a problem whatsoever; D has multiple 
examples of best-effort approaches: `@safe` is a good example. 
Not every code that actually is without undefined behavior can be 
detected as such, but a lot of simple cases can. Likewise, not 
every infinite recursion (or infinite loop without side-effects) 
can be detected, but simple cases can.

I know a little about C++ compilers: GCC and Clang produce 
warnings in the font-end, but MSVC produces warnings in the 
back-end. For that reason, MSVC can report problems that became 
apparent after optimization.

In C++, an infinite loop without observable effects is undefined 
behavior. The optimizer can use this in the form that a loop that 
is not obviously finite can be assumed to be finite if it does 
not have observable effects.

To my knowledge, D does not make infinite loop without observable 
effects undefined behavior, but D has another policy: It’s 
reasonable to make an error what the programmer cannot possibly 
want. An infinite loop without observable effects is an example 
of that.

> And while we’re at it, is it possible to detect all kinds of 
> straightforward infinite loops (not coming from recursion)?

I’d say that “all kinds of straightforward” is a contradiction in 
itself. The term “straightforward” is subjective. The detection 
provably cannot be perfect. We have to select cases.

> That is where the generated code, perhaps as a result of 
> optimisation, is a straight loop with no body. Can we implement 
> that?

Doing that after optimization has one problem: Optimization is 
optional and subject to invisible improvement. Whether you get 
compile-errors should not depend on which compiler you use and 
what optimization level you specified.

Practical example: Say you have code that is practically dead 
code and has an infinite loop. It’s practice never reached (or 
provably unreachable, but the compiler doesn’t detect it). 
Everything compiles and is fine. Now we make the infinite loop an 
error, when it’s detected after optimization. The optimizer 
doesn’t reduce it enough, so still everything compiles and is 
fine. In a later compiler version, the optimizer is improved, and 
now the loop is detected and the build fails. This is not a great 
user experience.

It’s totally another thing if it’s a warning.

It’s totally another thing if the detection is in the front end, 
the changelog mentions an improvement to infinite loop detection 
and the code falls under it.
If my boss asks my why the build failed, I could point to the 
compiler update and the changelog.

> It ought to be possible because the backend can spot a pattern 
> like "L1:  jmp  L1".

* If it comes from the front-end and is consistent among 
compilers and optimization levels, it should be an error. It 
could be a warning, but the no-warnings policy applies.
* If it comes from the back-end, it should be a warning.

> I promise I’m not trying to solve the Halting Problem.

The Halting Problem can be solved in special cases. It is 
perfectly reasonable to ask for best-effort solutions. `@safe` is 
one of them.

A crucial component is `pure`, which D fortunately already has. A 
quick sketch would be:
* A loop with a constant-`true` condition is infinite if it 
contains no `break`. It has definitely no observable effects if 
all operations in it are `pure`.
* A function definitely recurses infinitely if the only reachable 
`return` statements lead to a recursion. It has no observable 
effects if all operations between the function entry and those 
`return` statements are `pure`.

In contrast to `@safe`, which is an under-approximation, the 
infinite loop detection is an over-approximation. By that I mean 
that – accepts-invalid bugs aside – code that is `@safe` today 
will remain `@safe` in the future; code that today isn’t `@safe` 
but would be valid as `@trusted` today might be `@safe` in 
v2.142.1. The `@safe` checks reject code that *might* be unsafe. 
The infinite loop detection wouldn’t reject code that *might* be 
an infinite loop (there are languages that do that; e.g. 
[Coq](https://en.wikipedia.org/wiki/Coq) guarantees termination); 
it would reject some code that *definitely* is an infinite loop. 
And with improvements, there might be more and more (not less and 
less) cases that will be detected as *definitely* an infinite 
loop.

DMD neither detects `while(1){}` nor `int f(int x) => f(x);`. 
Those would be bare-minimum test cases. You get an error for the 
latter if you make the return type `auto`, so there is something 
to work with. `auto` return type detection accounts for infinite 
loops, even via mutual recursion (e.g. `f` calls `g`, `g` calls 
`h`, `h` calls `f`).

Actually, you can count the fact that, in D, an empty statement 
is invalid for loop bodies, is a measure against unintentional 
infinite loops: `while (cond());` in usual C++ compilers triggers 
an infinite-loop warning; in D, it’s syntactically invalid.