The extent of trust in errors and error handling

Wed Feb 1 18:29:57 PST 2017

On Wed, 01 Feb 2017 11:25:07 -0800, Ali Çehreli wrote:
> 1) There is the well-known issue of whether Error should ever be caught.
> If Error represents conditions where the application is not in a defined
> state, hence it should stop operating as soon as possible, should that
> also carry over to other applications, to the OS, and perhaps even to
> other systems in the whole cluster?

My programs tend to apply operations to a queue of data. It might be a 
queue over time, like incoming requests, or it might be a queue based on 
something else, like URLs that I extract from HTML documents.

Anything that does not impact my ability to manipulate the queue can be 
safely caught and recovered from.

Stack overflow? Be my guest.

Null pointer? It's a bug, but it's probably specific to a small subset of 
queue items -- log it, put it in the dead letter queue, move on.

RangeError? Again, a bug, but I can successfully process everything else.

Out of memory? This is getting a bit dangerous -- if I dequeue another 
item after OOM, I might be able to process it, and it might work (for 
instance, maybe you tried to download a 40GB HTML, but the next document 
is reasonably small). But it's not necessarily that easy to fix, and it 
might compromise my ability to manipulate the queue.

Assertions? That obviously isn't a good situation, but it's likely to 
apply only to a subset of the data.

This requires me to have two flavors of error handling: one regarding 
queue operations and one regarding the function I'm applying to the queue.

> For example, if a function detected an inconsistency in a DB that is
> available to all applications (as is the case in the Unix model of
> user-based access protection), should all processes that use that DB
> stop operating as well?

As stated, that implies each application tags itself with whether it 
accesses that database. Then, when the database is known to be 
inconsistent, we immediately shut down every application that's tagged as 
uing that database -- and presumably prevent other applications with the 
tag from starting.

It seems much more friendly not to punish applications when they're not 
trying to use the affected resource. Maybe init read a few configuration 
flags from the database on startup and it doesn't have to touch it ever 
again. Maybe a human will resolve the problem before this application 
makes its once-per-day query.

> 2) What if an intermediate layer of code did in fact handle an Error
> (perhaps raised by a function pre-condition check)? Should the callers
> of that layer have a say on that? Should a higher level code be able to
> say that Error should not be handled at all?
> 
> For example, an application code may want to say that no library that it
> uses should handle Errors that are thrown by a security library.

There's a bit of a wrinkle there. "Handling" an error might include 
catching it, adding some extra data, and then rethrowing.

> I think there is no way of
> requiring that e.g. a square root function not have side effects at all:
> The compiler can allow a piece of code but then the library that was
> actually linked with the application can do anything else that it wants.

You can write a compiler with its own object format and linker, which lets 
you verify these promises at link time.

As an aside on this topic, I might recommend looking at Vigil, the 
eternally morally vigilant programming language:
https://github.com/munificent/vigil

It has a rather effective way of dealing with errors that aren't 
explicitly handled.