Alternatives to exceptions for error handling

Roman Kashitsyn romankashicin at gmail.com
Mon Nov 30 18:48:05 UTC 2020


On Monday, 30 November 2020 at 09:58:36 UTC, Gregor Mückl wrote:

> What kind of error conditions are you talking about that you 
> consider handleable locally? Do you have concrete examples? I 
> am asking because this is way outside the experiences I have 
> made regarding error handling and I would like to understand 
> your perspective.


Sure, let me give a couple of examples.

Imagine you are writing a step of a pipeline that needs to 
support backpressure.
When you push data downstream and the call fails because 
downstream is overloaded, a good recovery would be to start 
buffering data and propagate the failure upstream when the local 
buffer fills up.
If we terminate the whole pipeline each time we have a minor 
congestion, we'll never have anything done.


Another example: I was once implementing a distributed task 
execution service using Zookeeper.  It had scheduler processes 
and worker processes distributed across multiple DCs. Only one 
scheduler must be active at any time, and other scheduler 
instances wait in stand-by mode in case the leader dies or 
becomes partitioned.
First, Zookeeper API is callback-based, and throwing exceptions 
into the event-loop not controlled by your application makes 
little sense.  So we already need a different error handling 
mechanism for such cases.
Let's see then what happens if a node looses network connection 
for a short period of time.
For a worker it's not a problem at all: it should continue 
whenever it has been doing and wait for the network to appear 
again.  Tasks were mainly quite expensive to run, so aborting 
them was a bad idea.
A scheduler cannot operate without the network, so simply 
crashing would be an option. However, this involves an expensive 
recovery procedure, and short network disruptions happened very 
often.  The strategy I implemented was to schedule an action that 
retries whatever we were trying to do as soon as the network 
appears again (if this process is still a leader, otherwise 
crash).


More information about the Digitalmars-d mailing list