Towards a better conceptual model of exceptions (Was: Re: The Right Approach to Exceptions)

Tue Feb 21 13:52:01 PST 2012

On Tue, Feb 21, 2012 at 02:40:30PM -0500, Jonathan M Davis wrote:
> On Tuesday, February 21, 2012 00:15:48 H. S. Teoh wrote:
> > TRANSITIVITY
> 
> I still contend that this useless, because you need to know what went wrong to 
> know whether you actually want to retry anything. And just because the 
> particular operation that threw could be retried again doesn't mean that the 
> code that catches the exception can retry the function that _it_ called which 
> resulted in an exception somewhere down the call stack. And often, you don't 
> even know which function it was that was called within the try block. So, I 
> don't see how transivity really matters. If you know what what wrong - which 
> the type of the exception will tell you - then _that_ is what helps your code 
> make a useful decision as to what to do, not a transivity property.

The point of my little exercise was not to try to solve *every* case of
exception handling, but to isolate those cases for which generic
handling *does* make sense. I don't pretend that my categories cover
*every* case. They obviously don't, as you point out.  For those cases
where it doesn't make sense, you still catch the specific exception and
do your specific recovery, as before.

What I'm trying to do is to evaluate the possibility/feasibility of an
error recovery system where you *don't* have to know what the specific
error is, in order to do something useful with it. What are the errors
that *can* be recovered this way, if there are such errors.

This is why transitivity is useful, because it allows you to not have to
worry about what the specific problem is, and yet still be able to do
something meaningful, because the problem doesn't change in nature as
you move up the call stack. Obviously, non-transitive errors have to be
handled on a case-by-case basis; I'm not negating that at all.

The importance of transitivity to generic handling can be seen as
follows:

Suppose X calls Y and Y calls Z. Z encounters a problem of some sort,
let's say for instance it's an input error. So from Y's point of view,
the problem can be corrected if it passes different input to Z.  But
from X's point of view, this is not necessarily true: X's input to Y may
have nothing to do with Y's input to Z. So trying to generically handle
Z's problem at X's level makes no sense. The problem is not transitive,
so no generic handling is possible. You have to catch by type and
recover by type. End of story.

But suppose Z encounters a transitive problem. Say for instance it's a
component failure: one of the functions that Z calls has failed. Well,
by extension, that also means Z itself has failed. Now, from Y's point
of view, it can attempt to recover by calling W in lieu of Z, if there
exists a W that performs an equivalent function to Z. But since the
problem is transitive, we can go up to X's level. From X's point of
view, it knows nothing about Z, but it *does* know that Y has
encountered a component failure (since component failure is transitive).
Viewed from X's perspective, Y is the problem (it doesn't know about Z).
So if there's an alternative to Y that performs an equivalent function,
X can call that in lieu of Y.

So you see, even though X has absolute no idea what kind of problem Z
encountered, or even that such a thing as Z exists, it *can* make a
meaningful effort to recover from the problem that Z encountered. That's
what I'm trying to get at: generic handling.

Now, to avoid misunderstandings, I'm *not* saying that at every level of
the call stack there needs to be a recovery mechanism, such that Y has
an alternative to Z and X has an alternative to Y, and so on, all the
way up the stack. That would be foolish, since you'll be wasting time
writing alternative versions of everything.

How this approach benefits real-life programs is that you can insert
recovery mechanisms at strategic points in the call chain, so that when
problems occur lower down the stack, you can handle it at those points
without needing to unwind the stack all the way to the top. Obviously,
the try-catch mechanism already does this; but the difference is that
those strategic points may not be high enough up the call chain to be
able to make a decision about *which* recovery approach to take. They
know how to implement the recovery, but they don't know if they should
try, or just give up. So this is here is where the high-level delegates
come in. *They* know how to decide whether to attempt recovery or just
give up, but they *don't* know how to implement recovery, because the
lower-level code is a black box to them.

By tying the two together via the "Lispian system", you have the
possibility of making decisions high up the call stack, and yet still be
able to effect low-level recovery strategies.

To go back to the earlier example: if Z fails, then from X's point of
view Y has also failed, since it doesn't know what Z is. However, if X
registers a component failure handler and then calls Y, and Y is capable
of switching to W instead of Z, then when Z fails, X's delegate gets to
say "Oh, we have a component failure somewhere in Y. Let's try to
replace that component" -- even though X has no idea how to do this, nor
even what that failed component was. But Y does, so after X's delegate
says "try to replace the failed component", Y swaps in W, and continues
doing what it was supposed to.

If you still doubt the usefulness of such a system, consider this: X
implements a complex numerical computation, among the many operations of
which includes solving a linear system, which is done by Y. To solve the
system, Y at some point calls Z, which, say, inverts a matrix. But due
to the particular algorithm that Z uses for matrix inversion, it runs
into a numerical overflow, so it throws an error. Y catches the error,
and informs X's delegate, "hey, I've encountered a problem while solving
your linear system, but I know of another algorithm that I can try that
may give you the result you want; should I proceed?". X's delegate then
can decide to give up - "No, this is a real-time system, and there's not
enough time left to attempt another calculation", or it can decide to go
ahead "I want that system solved, no matter what!". If X's delegate
decides to continue, Y will swap in an alternative way of inverting the
matrix that avoids overflow, and thus be able to finish its computation.

If there was no way for Y to talk to X (via X's delegate) while control
is still in the context of Y, then Y would be forced to make a decision
it's not qualified to make (it may be just a generic algebraic package,
it doesn't know it's running in a real-time environment or not), or it
has to unwind the stack back to X, at which point it's too late for X to
salvage the situation (it has to call Y all over again if it wants to
re-attempt the computation -- and even then there may not be a way for X
to tell Y to replace Z with W, because X doesn't even know what Z and W
are).

> So, yes. A particular exception type is generally transitive or not,
> but you know that by the nature of the type and the problem that it
> represents. I don't see how focusing on transivity is useful.
[...]

It's funny, earlier in this thread I argued the same thing about
Andrei's is_transient proposal.

But I've since come to realize what Andrei was trying to get at: yes we
know how to deal with specific errors in a specific way, that's given;
but *are* there cases for which we don't *need* to know the specifics?
Can we take advantage of generic handling for those cases, so as to
reduce (not eliminate!) the amount of specific handling we need to do?

I think that angle is worth pursuing, even if it's to conclude at the
end that, no, generic handling is not possible/feasible. OK, lesson
learned. We move on.

But if there are cases for which generic handling *is* possible, and if
we can implement powerful error recovery schemes in a generic way, then
we'd be foolish not to take advantage of it.

[...]
> So, I see no problem with you experimenting, but I don't think that we
> should be drastically changing how the standard library functions with
> regards to exceptions. We would definitely gain something by cleaning
> up how they're organized, and maybe adding additional capabilites to
> improve their printing abilities like Andrei wants to do would be of
> some value, but we don't need a complete redesign, just some tweaks.
[...]

Currently deadalnix & myself are experimenting with ways of implementing
the "Lispian system" as a separate Phobos module that offers enhanced
exception handling, on top of the usual try/catch mechanism. I'm not
proposing to replace anything, at least not in the short term. Should
this system prove worthwhile in the long run, *then* we can start
thinking about whether we can make extensive use of it in the library
code, or reworking exception handling at the language level.

T

-- 
Two wrongs don't make a right; but three rights do make a left...