Alternatives to exceptions for error handling
Roman Kashitsyn
romankashicin at gmail.com
Mon Nov 30 18:48:05 UTC 2020
On Monday, 30 November 2020 at 09:58:36 UTC, Gregor Mückl wrote:
> What kind of error conditions are you talking about that you
> consider handleable locally? Do you have concrete examples? I
> am asking because this is way outside the experiences I have
> made regarding error handling and I would like to understand
> your perspective.
Sure, let me give a couple of examples.
Imagine you are writing a step of a pipeline that needs to
support backpressure.
When you push data downstream and the call fails because
downstream is overloaded, a good recovery would be to start
buffering data and propagate the failure upstream when the local
buffer fills up.
If we terminate the whole pipeline each time we have a minor
congestion, we'll never have anything done.
Another example: I was once implementing a distributed task
execution service using Zookeeper. It had scheduler processes
and worker processes distributed across multiple DCs. Only one
scheduler must be active at any time, and other scheduler
instances wait in stand-by mode in case the leader dies or
becomes partitioned.
First, Zookeeper API is callback-based, and throwing exceptions
into the event-loop not controlled by your application makes
little sense. So we already need a different error handling
mechanism for such cases.
Let's see then what happens if a node looses network connection
for a short period of time.
For a worker it's not a problem at all: it should continue
whenever it has been doing and wait for the network to appear
again. Tasks were mainly quite expensive to run, so aborting
them was a bad idea.
A scheduler cannot operate without the network, so simply
crashing would be an option. However, this involves an expensive
recovery procedure, and short network disruptions happened very
often. The strategy I implemented was to schedule an action that
retries whatever we were trying to do as soon as the network
appears again (if this process is still a leader, otherwise
crash).
More information about the Digitalmars-d
mailing list