RFC: Change what assert does on error

Sun Jul 6 02:08:43 UTC 2025

On Saturday, July 5, 2025 7:30:01 AM Mountain Daylight Time Dennis via Digitalmars-d wrote:
> On Saturday, 5 July 2025 at 07:07:00 UTC, Jonathan M Davis wrote:
> > If a mutex is locked and freed using RAII (or scope statements
> > are used, and any of those are skipped), then you could get
> > into a situation where a lock is not released like it was
> > supposed to be, and then code higher up the stack which does
> > run while the stack is unwinding attempts to get that lock
>
> Why would your crash handler infinitely wait on one of your
> program's mutexes? I'd design a crash reporter for a UI
> application as follows:
>
> - Defensively collect traces/logs up to the point of the crash
> - Store it somewhere
> - Launch a separate process that lets the user easily send the
> data to the developer
> - Exit the crashed program
>
> I think that's how most work on
> https://en.wikipedia.org/wiki/Crash_reporter
> Except that they don't even collect the data inside the crashed
> program, but let the crash handler attach a debugger like gdb to
> the process and collect it that way, which is even more defensive.
>
> I still don't see how a missed scope(exit)/destructor/finally
> block (they're interchangable in D) not putting the hourglass
> cursor back to a normal cursor on the crashed window would hurt
> the usability of a crash handler, or the quality of the log.

Arbitrary D programs aren't necessarily using crash handlers, and the way
that Errors work affect all D programs. Also, the fact that Errors unwind
the stack at all actually gets in way of crash handlers, because it throws
away program state. For instance, a core dump won't give you where the
program was when the error condition was hit like it would with a segfault,
and a D program that throws an Error doesn't even give you a core dump,
because it still exits normally - just with a non-zero error code.

Honestly, I don't think that it makes any sense whatsoever for Errors to be
Throwables and yet not have all of the stack unwinding code run properly.

If an Error is such a terrible condition that we don't even want the stack
unwinding code to be run properly, then instead of throwing anything, the
program should have just printed out a stack trace and aborted right then
and there, which would avoid running any code that might cause any problems
while shutting down and giving crash handlers the best opportunity to get
information about the state of the process at the point of the error,
because the program would have terminated at that point.

On the other hand, unwinding the stack and running all of the cleanup code
gives the program a chance to terminate more gracefully as well as to get
information about the state of the program as it unwinds, which can help
programmers debug what went wrong and get information on how the program got
to where it was when the error condition occurred. And for that to work at
all safely, the cleanup code needs to be run.

The logic of the language rules potentially falls apart if the cleanup code
is skipped, and the logic that the programmer intended _definitely_ falls
apart at that point, because the language rules are written around the idea
that the cleanup code is run, and code in general is going to have been
written with the assumption that the cleanup code will all have been run
properly. And that could affect whether code is memory safe, because code
that's normally guaranteed to run wouldn't run. It would be very easy for a
decision to have been made about whether something was memory safe based on
the assumption that all of the code that's normally guaranteed to run would
have run (be it an assumption built into the language itself and @safe or an
assumption that the programmer relied on to ensure that it was reasonable to
mark their code as @trusted).

If we skip _any_ cleanup mechanisms while unwinding the stack, we're
throwing normal language guarantees out the window and skipping code that
could have been doing just about anything that that program relied on for
proper operations (be it logging, cleaning up files, communicating with
another service about it shutting down, etc.). We don't know what
programmers decided to do in any of that code, but it was code that they
wanted run when the stack was unwound, because that's what that code was
specifically written for. Sure, maybe in some cases, if they'd thought about
at, the programmer would have preferred that some of it be skipped with an
Error as opposed to an Exception, but aside from catch(Exception) vs
catch(Error), we don't have a way to distinguish that. And I think that in
the general case, code is simply written with the idea that cleanup code
will be run whenever the stack is unwound, since that's the point of it.

Either way, by skipping any cleanup code, we're putting the program into an
invalid state and risking that whatever code does run during shutdown then
behaves incorrectly. And just because an Error was thrown doesn't even
necessarily mean that any of that code was in an invalid state. It could
have simply been that there was a bug which resulted in a bad index, and
then a RangeError was thrown before anything bad could actually happen. So,
the Error actually prevented a problem from happening, and then if the clean
up code is skipped, it proceeds to cause problems by skipping code that's
supposed to run when the stack unwinds.

I can understand not wanting any stack unwinding code to run if an Error
occurs on the theory that the condition is bad enough that there's a risk
that some of what the stack unwinding code would do would make the situation
worse, but IMHO, then we shouldn't even have Errors. We should have just
terminated the program and thrown nothing, both avoiding running any of that
code and giving crash handlers their best chance at getting information on
the program's state. But since we do have Errors, and they're Throwables,
the program should actually run the cleanup code properly and attempt to
shutdown as cleanly as it can. Trying to both throw Errors and skip the
cleanup code is the worst of both worlds, and I don't see how it makes any
sense whatsoever.

And maybe we should make the behavior configurable so that programemrs can
choose which they want rather than mandating that it work one way or the
other, but what we have right now is stuck in a very bizarre place in the
middle where we throw Errors and run _most_ of the cleanup code, but we
don't run all of it.

- Jonathan M Davis