Sutter's ISO C++ Trip Report - The best compliment is when someone else steals your ideas....

Sun Jul 15 10:45:23 UTC 2018

On Friday, 13 July 2018 at 12:55:33 UTC, Adam D. Ruppe wrote:
> You use process isolation so it is easy to restart part of it 
> without disrupting others. Then it can crash without bringing 
> the system down. This is doable with segfaults and range 
> errors, same as with exceptions.
>
> This is one of the most important systems engineering 
> principles: expect failure from any part, but keep the system 
> as a whole running anyway.

If we are talking about something application-specific and in 
probabilistic terms, then yes certainly.

But that is not the absolutist position where any failure should 
lead to a shutdown (and consequently a ban on reboot as the 
failed assert might happen hours after the actual buggy code 
executed).

The absolutist position would also have to assume that all 
communicated state is corrupted so a separate process does not 
improve the situation. Since you don't know with a 100% certainty 
what the bug consists of you should not retain any state from any 
source after the _earliest_ time where the buggy logic in theory 
could have been involved. All databases should be assumed 
corrupted, no messages should be accepted etc (messages and 
databases are no different from memory in this regard).

In reality absolutist positions are usually not possible to 
uphold so you have to move to a probabilistic position. And the 
compiler cannot make probabilistic assumptions, you need a lot of 
contextual understanding to make those probabilistic assessment 
(e.g. the architect or programmer has to be involved).

Fully reactive systems does not retain state of course, and those 
would change the argument somewhat, but they are very rare... 
mostly limited to control systems (cars, airplanes etc).

The idea behind actor-based programming (e.g. Erlang) isn't that 
bugs don't occur or that the overall system will exhibit correct 
behaviour, but that it should be able to correct or adapt to 
situations despite bugs being present.  But that is really, 
predominantly, not available to us with the very "crisp" logic we 
use in current languages (true/false, all or nothing). Maybe 
something better will come out of probabilistic programming 
paradigms and software synthesis some time in the future. Within 
the current paradigms we are stuck with the judgment of the 
humans involved.

Interestingly biological systems are much better at robustness, 
fault tolerance and self-healing, but that involves a lot of 
overhead and also assumes that some failures are acceptable as 
long as the overall system can recover from it. Actor-programming 
is based on the same assumption, the health of the overall (big) 
system.